Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Exercises and Solutions, Exercises of Applied Statistics

Indian Statistical Institute Applied Statistics

Revision exercise set for Applied Statistics, covering probability distributions, PDFs/CDFs, maximum likelihood estimation, descriptive statistics, hypothesis testing, goodness-of-fit tests, and joint distributions, with worked solutions and explanations.

Typology: Exercises

2022/2023

Uploaded on 12/16/2025

venkataramanan 🇮🇳

1 document

1 / 6

This page cannot be seen from the preview

Don't miss anything!

Applied Statistics (ECS764P) - Revision exercises

Fredrik Dahlqvist

14 December 2022

1. Basic definitions. Consider the density function given by

f(x) = (αx if 0≤x≤2

0else

•For what value of αdoes fdefine a probability density function?

•Using this value of αcompute the CDF of the distribution.

•Still using this value of αcompute the probability mass of the interval [0,1] and the probability mass

of the interval [1

/2,3

/2].

Answer:

•For f(t)to define a probability density function we need

Zf(t)dt = 1

This means that we need

αt2

2

2

0

=α4

2= 1

and thus α=1

/2.

•By definition, the CDF is given by

F(t) = Zt

−∞

f(t)dt =Zt

0

t

2dt =









0t < 0

t2

40≤t≤2

1t > 2

•The probability mass of [0,1] is just the CDF evaluate at 1, i.e. 1

4. The probability mass of [1

/2,3

/2]

is given by

Z3

/2

1

/2

f(t)dt = 1

which can be computed directly, or we can use F(3

/2)−F(1

/2) = 1

2.

2. Basic definitions. Consider the CDF defined by

F(t) = 









0if t < 0

3tif 0≤t≤1

/3

1else

•Compute the corresponding PDF. Can you recognise this distribution?

•Check that it does indeed define a probability density function

•If 3twere replaced by 6tin the definition above (and everything else remained the same), would F(t)

be a CDF? Justify your answer in one sentence.

Answer:

1

Discover Exercises of Applied Statistics Indian Statistical Institute

Partial preview of the text

Download Exercises and Solutions and more Exercises Applied Statistics in PDF only on Docsity!

Applied Statistics (ECS764P) - Revision exercises

Fredrik Dahlqvist

14 December 2022

Basic definitions. Consider the density function given by

f (x) =

αx if 0 ≤ x ≤ 2 0 else

For what value of α does f define a probability density function?
Using this value of α compute the CDF of the distribution.
Still using this value of α compute the probability mass of the interval [0, 1] and the probability mass of the interval [^1 / 2 , 3 / 2 ].

Answer:

For f (t) to define a probability density function we need ∫ f (t)dt = 1

This means that we need α t

2 2

2

0

= α^4 2

and thus α = 1 / 2.

By definition, the CDF is given by

F (t) =

∫ (^) t

−∞

f (t) dt =

∫ (^) t

0

t 2 dt^ =

0 t < 0 t^2 4 0 ≤^ t^ ≤^2 1 t > 2

The probability mass of [0, 1] is just the CDF evaluate at 1, i.e. 14. The probability mass of [^1 / 2 , 3 / 2 ] is given by (^) ∫ (^3) / 2

(^1) / 2

f (t)dt = 1

which can be computed directly, or we can use F (^3 / 2 ) − F (^1 / 2 ) = 12.

Basic definitions. Consider the CDF defined by

F (t) =

0 if t < 0 3 t if 0 ≤ t ≤ 1 / 3 1 else

Compute the corresponding PDF. Can you recognise this distribution?
Check that it does indeed define a probability density function
If 3 t were replaced by 6 t in the definition above (and everything else remained the same), would F (t) be a CDF? Justify your answer in one sentence.

Answer:

By using the fundamental theorem of calculus

f (t) =

d dt

F (t) =

0 t < 0 3 0 ≤ t ≤ 1 / 3 0 t > 3

This is the uniform distribution on [0, 1 / 3 ].

It is easy to see that (^) ∫ f (t) dt =

0

3 dx = 1

so f (t) does indeed define a distribution.

It would NOT define a CDF since it would exceed 1 beyond x = 1 / 6 and a CDF can never exceed one since it measures a probability.

Maximum Likelihood Estimators. The exponential distribution with parameter λ is a probability measure on the positive real line whose pdf is given by

f (x) = λe−λx, λ > 0

Compute the Maximum Likelihood Estimator for the parameter λ. Briefly explain the different steps of your computation. Answer: Suppose we observe n samples (x 1 ,... , xn) drawn from Exp (λ). We want to find the parameter λ which maximises the likelihood of our observations, that is to say that maximises

∏^ n

i=

f (xi) =

∏^ n

i=

λe−λxi^.

We start by taking the logarithm in order to turn this product into a sum.

ln

∏^ n

i=

λe−λxi^ =

∑^ n

i=

(ln λ − λxi) = n ln λ − λ

∑^ n

i=

xi

To find the λ which maximises this function we take the derivative with respect to λ, set it to zero, and solve for λ

d dλ

n ln λ − λ

∑^ n

i=

xi

n λ −

∑^ n

i=

xi = 0

which means that ˆλ = ∑nn i=1 xi which is simply the inverse of the sample mean.

Descriptive statistics.
- State a condition on the pdf which guarantees that a probability distribution has no skew.
- State a condition which guarantees that a set of samples x 1 ,... , xn has sample kurtosis equal to 0
- Give a set of three samples x 1 , x 2 , x 3 such that (i) their sample mean in 4, their median is 3 and their range is 7
- Compute the mean, the standard deviation and the skew of the distribution on { 1 , 2 , 3 } whose pdf/pmf is given by f (1) = 1 / 4 , f (2) = 1 / 4 , f (3) = 1 / 2

Answer:

If the pdf is an even function (symmetric around 0), then the distribution will have no skew. More generally, if a pdf is symmetric around the mean of the distribution – that is to say if f (μ + x) = f (μ − x) – then it will have no skew.

Goodness of fit. Suppose each individual in a population can be assigned to one of four classes A, B, C or D. We want to test the hypothesis that the probability distribution of these classes is

d(A) =^1 2

, d(B) =^1 4

, d(C) = d(D) =^1 8 After sampling 1,000 individuals we get 465 individuals in class A, 260 individuals in class B, 180 indi- viduals in class C and the remaining individuals in class D.

Write down the empirical probability measure.
Compute the (Pearson) distance between the empirical probability measure and the probability measure under H 0.
What is the approximate distribution of this distance (use Pearson’s theorem, as seen in the lectures)?
Using α = 0. 99 can you reject H 0? Answer:
The empirical probability measure is given by

demp = 465 1000

δA +^260 1000

δB +^180 1000

δC + 95 1000

δD

This probability measure gives us the observed frequencies as probability masses. For example

demp(A) = 465 1000

δA(A) +^260 1000

δB (A) +^180 1000

δC (A) + 95 1000

δD (A) = 465 1000 Since δA(x) = 1 when x = A and 0 else.

The Pearson distance between d and demp is given by

dist(d, demp) =

2 −^

4 −^

8 −^

According to the theorem discussed in the lectures the distribution of n · dist(d, demp) approaches the χ^2 (3) distribution with 3 = 4 − 1 degrees of freedom as n → ∞. That means that the distribution of dist(d, demp) approaches the distribution (^1) n χ^2 (3) as n → ∞. We could compute this distribution (it is known to be a Gamma distribution) and use it to build a goodness-of-fit test, but in general the Pearson goodness-of-fit test is performed by using n · dist(d, demp) and the χ^2 -distribution directly.
The p-value (that is to say the probability under H 0 of getting an even larger value of n·dist(d, demp)) is given by χ^2 (3) ([1000 · 0. 03425 , +∞)) = 1. 75 e − 07 < 1 − α = 0. 01 The (approximate) probability of the observations under H 0 are thus sufficiently unlikely to reject H 0. Note that the values above (both for the statistic 1000 · 0 .03425 = 34. 25 and the p-value 1. 75 e − 07 ) are exactly what you get if you call the python function

scipy.stats.chisquare([ 465 , 260 , 180 , 95 ], [ 500 , 250 , 125 , 125 ])

which perform Pearson’s χ^2 -test.

Joint distributions.
- Find the Bernoulli distribution Bern (p) on { 0 , 1 } such that the product Bern (p) ⊗ Bern (p) is given by

Bern (p) ⊗ Bern (p) (0, 0) =

Bern (p) ⊗ Bern (p) (0, 1) = 3 16 Bern (p) ⊗ Bern (p) (1, 0) = 3 16 Bern (p) ⊗ Bern (p) (1, 1) = 9 16

What is the covariance of this joint distribution?
Compute the marginals of this joint distribution.
Consider the map AND : { 0 , 1 } × { 0 , 1 } → { 0 , 1 } given by the usual boolean AND operation ( 0 AND 0 = 0 AND 1 = 1 AND 0 = 0, 1 AND 1 = 1). Compute the pushforward of Bern (p) ⊗ Bern (p) under the map AND. (Simply apply the definition, this is very similar to an independent +, but it uses AND instead ).
Consider now the distribution d on { 0 , 1 } × { 0 , 1 } given by

d(0, 0) =

d(0, 1) =

d(1, 0) =

d(1, 1) = 0

Show that d is a joint probability distribution but that it is not a product measure.

Compute the marginals of d as well as their means and variances.
Compute the covariance and the correlation of d. What can you deduce from the sign of the correlation coefficient? Is this observation reflected in the definition of d?

Answer:

Observe that by definition

Bern (p) ⊗ Bern (p) (0, 0) = Bern (p) (0) · Bern (p) (0) = (Bern (p) (0))^2 =

and thus Bern (p) (0) =

1 16 =^

1

It immediately follows that^ Bern (p) (1) =^

3 4 and thus^ p^ =^

3 4 since p is the probability of success (that is to say the probability mass of { 1 }). We could have used the same argument using Bern (p) ⊗ Bern (p) (1, 1) = 169 to directly get the same answer.

The covariance of a product of distributions is always equal to zero (see lectures).
The marginals of a product of distributions are given by the components of the product, in this case both marginals are given by Bern (^3 / 4 ).
By definition of the pushforward we get

Bern (^3 / 4 ) AND Bern (^3 / 4 ) (0) =Bern (^3 / 4 ) ⊗ Bern (^3 / 4 ) {(x, y) | x AND y = 0} =Bern (^3 / 4 ) ⊗ Bern (^3 / 4 ) (0, 0) + Bern (^3 / 4 ) ⊗ Bern (^3 / 4 ) (0, 1)+ Bern (^3 / 4 ) ⊗ Bern (^3 / 4 ) (1, 0)

=^7 16

It follows that Bern (^3 / 4 ) AND Bern (^3 / 4 ) (1) = 1 − Bern (^3 / 4 ) AND Bern (^3 / 4 ) (0) = 169 and so

Bern (^3 / 4 ) AND Bern (^3 / 4 ) = Bern (^9 / 16 )

Alternatively, we could have started with 1 and immediately gotten that

Bern (^3 / 4 ) AND Bern (^3 / 4 ) (1) =Bern (^3 / 4 ) ⊗ Bern (^3 / 4 ) {(x, y) | x AND y = 1} =Bern (^3 / 4 ) ⊗ Bern (^3 / 4 ) (1, 1)

=

More generally, Bern (p) AND Bern (p) = Bern

p^2

The proof is by contradiction. Suppose that d is the product of two distributions, that is to say d = d 1 ⊗ d 2. Then d(1, 1) = d 1 ⊗ d 2 (1, 1) = d 1 (1)d 2 (1) = 0 which means that either d 1 (1) = 0 or d 2 (1) = 0. The first possibility implies that

d(1, 0) = d 1 ⊗ d 2 (1, 0) = d 1 (1)d 2 (0) = 0

which contradicts the fact that d(1, 0) = 3 / 8. The second possibility leads to a contradiction for the same reason. So d cannot be a product of two distributions.

Exercises and Solutions, Exercises of Applied Statistics

Related documents

Partial preview of the text

Download Exercises and Solutions and more Exercises Applied Statistics in PDF only on Docsity!

Applied Statistics (ECS764P) - Revision exercises

Fredrik Dahlqvist

14 December 2022

2 −^

4 −^

8 −^

8 −^