Download Exercises and Solutions and more Exercises Applied Statistics in PDF only on Docsity!
Applied Statistics (ECS764P) - Revision exercises
Fredrik Dahlqvist
14 December 2022
- Basic definitions. Consider the density function given by
f (x) =
αx if 0 ≤ x ≤ 2 0 else
- For what value of α does f define a probability density function?
- Using this value of α compute the CDF of the distribution.
- Still using this value of α compute the probability mass of the interval [0, 1] and the probability mass of the interval [^1 / 2 , 3 / 2 ].
Answer:
- For f (t) to define a probability density function we need ∫ f (t)dt = 1
This means that we need α t
2 2
2
0
= α^4 2
and thus α = 1 / 2.
- By definition, the CDF is given by
F (t) =
∫ (^) t
−∞
f (t) dt =
∫ (^) t
0
t 2 dt^ =
0 t < 0 t^2 4 0 ≤^ t^ ≤^2 1 t > 2
- The probability mass of [0, 1] is just the CDF evaluate at 1, i.e. 14. The probability mass of [^1 / 2 , 3 / 2 ] is given by (^) ∫ (^3) / 2
(^1) / 2
f (t)dt = 1
which can be computed directly, or we can use F (^3 / 2 ) − F (^1 / 2 ) = 12.
- Basic definitions. Consider the CDF defined by
F (t) =
0 if t < 0 3 t if 0 ≤ t ≤ 1 / 3 1 else
- Compute the corresponding PDF. Can you recognise this distribution?
- Check that it does indeed define a probability density function
- If 3 t were replaced by 6 t in the definition above (and everything else remained the same), would F (t) be a CDF? Justify your answer in one sentence.
Answer:
- By using the fundamental theorem of calculus
f (t) =
d dt
F (t) =
0 t < 0 3 0 ≤ t ≤ 1 / 3 0 t > 3
This is the uniform distribution on [0, 1 / 3 ].
- It is easy to see that (^) ∫ f (t) dt =
0
3 dx = 1
so f (t) does indeed define a distribution.
- It would NOT define a CDF since it would exceed 1 beyond x = 1 / 6 and a CDF can never exceed one since it measures a probability.
- Maximum Likelihood Estimators. The exponential distribution with parameter λ is a probability measure on the positive real line whose pdf is given by
f (x) = λe−λx, λ > 0
Compute the Maximum Likelihood Estimator for the parameter λ. Briefly explain the different steps of your computation. Answer: Suppose we observe n samples (x 1 ,... , xn) drawn from Exp (λ). We want to find the parameter λ which maximises the likelihood of our observations, that is to say that maximises
∏^ n
i=
f (xi) =
∏^ n
i=
λe−λxi^.
We start by taking the logarithm in order to turn this product into a sum.
ln
∏^ n
i=
λe−λxi^ =
∑^ n
i=
(ln λ − λxi) = n ln λ − λ
∑^ n
i=
xi
To find the λ which maximises this function we take the derivative with respect to λ, set it to zero, and solve for λ
d dλ
n ln λ − λ
∑^ n
i=
xi
n λ −
∑^ n
i=
xi = 0
which means that ˆλ = ∑nn i=1 xi which is simply the inverse of the sample mean.
- Descriptive statistics.
- State a condition on the pdf which guarantees that a probability distribution has no skew.
- State a condition which guarantees that a set of samples x 1 ,... , xn has sample kurtosis equal to 0
- Give a set of three samples x 1 , x 2 , x 3 such that (i) their sample mean in 4, their median is 3 and their range is 7
- Compute the mean, the standard deviation and the skew of the distribution on { 1 , 2 , 3 } whose pdf/pmf is given by f (1) = 1 / 4 , f (2) = 1 / 4 , f (3) = 1 / 2
Answer:
- If the pdf is an even function (symmetric around 0), then the distribution will have no skew. More generally, if a pdf is symmetric around the mean of the distribution – that is to say if f (μ + x) = f (μ − x) – then it will have no skew.
- Goodness of fit. Suppose each individual in a population can be assigned to one of four classes A, B, C or D. We want to test the hypothesis that the probability distribution of these classes is
d(A) =^1 2
, d(B) =^1 4
, d(C) = d(D) =^1 8 After sampling 1,000 individuals we get 465 individuals in class A, 260 individuals in class B, 180 indi- viduals in class C and the remaining individuals in class D.
- Write down the empirical probability measure.
- Compute the (Pearson) distance between the empirical probability measure and the probability measure under H 0.
- What is the approximate distribution of this distance (use Pearson’s theorem, as seen in the lectures)?
- Using α = 0. 99 can you reject H 0? Answer:
- The empirical probability measure is given by
demp = 465 1000
δA +^260 1000
δB +^180 1000
δC + 95 1000
δD
This probability measure gives us the observed frequencies as probability masses. For example
demp(A) = 465 1000
δA(A) +^260 1000
δB (A) +^180 1000
δC (A) + 95 1000
δD (A) = 465 1000 Since δA(x) = 1 when x = A and 0 else.
- The Pearson distance between d and demp is given by
dist(d, demp) =
2 −^
4 −^
8 −^
8 −^
- According to the theorem discussed in the lectures the distribution of n · dist(d, demp) approaches the χ^2 (3) distribution with 3 = 4 − 1 degrees of freedom as n → ∞. That means that the distribution of dist(d, demp) approaches the distribution (^1) n χ^2 (3) as n → ∞. We could compute this distribution (it is known to be a Gamma distribution) and use it to build a goodness-of-fit test, but in general the Pearson goodness-of-fit test is performed by using n · dist(d, demp) and the χ^2 -distribution directly.
- The p-value (that is to say the probability under H 0 of getting an even larger value of n·dist(d, demp)) is given by χ^2 (3) ([1000 · 0. 03425 , +∞)) = 1. 75 e − 07 < 1 − α = 0. 01 The (approximate) probability of the observations under H 0 are thus sufficiently unlikely to reject H 0. Note that the values above (both for the statistic 1000 · 0 .03425 = 34. 25 and the p-value 1. 75 e − 07 ) are exactly what you get if you call the python function
scipy.stats.chisquare([ 465 , 260 , 180 , 95 ], [ 500 , 250 , 125 , 125 ])
which perform Pearson’s χ^2 -test.
- Joint distributions.
- Find the Bernoulli distribution Bern (p) on { 0 , 1 } such that the product Bern (p) ⊗ Bern (p) is given by
Bern (p) ⊗ Bern (p) (0, 0) =
Bern (p) ⊗ Bern (p) (0, 1) = 3 16 Bern (p) ⊗ Bern (p) (1, 0) = 3 16 Bern (p) ⊗ Bern (p) (1, 1) = 9 16
- What is the covariance of this joint distribution?
- Compute the marginals of this joint distribution.
- Consider the map AND : { 0 , 1 } × { 0 , 1 } → { 0 , 1 } given by the usual boolean AND operation ( 0 AND 0 = 0 AND 1 = 1 AND 0 = 0, 1 AND 1 = 1). Compute the pushforward of Bern (p) ⊗ Bern (p) under the map AND. (Simply apply the definition, this is very similar to an independent +, but it uses AND instead ).
- Consider now the distribution d on { 0 , 1 } × { 0 , 1 } given by
d(0, 0) =
d(0, 1) =
d(1, 0) =
d(1, 1) = 0
Show that d is a joint probability distribution but that it is not a product measure.
- Compute the marginals of d as well as their means and variances.
- Compute the covariance and the correlation of d. What can you deduce from the sign of the correlation coefficient? Is this observation reflected in the definition of d?
Answer:
- Observe that by definition
Bern (p) ⊗ Bern (p) (0, 0) = Bern (p) (0) · Bern (p) (0) = (Bern (p) (0))^2 =
and thus Bern (p) (0) =
1 16 =^
1
- It immediately follows that^ Bern (p) (1) =^
3 4 and thus^ p^ =^
3 4 since p is the probability of success (that is to say the probability mass of { 1 }). We could have used the same argument using Bern (p) ⊗ Bern (p) (1, 1) = 169 to directly get the same answer.
- The covariance of a product of distributions is always equal to zero (see lectures).
- The marginals of a product of distributions are given by the components of the product, in this case both marginals are given by Bern (^3 / 4 ).
- By definition of the pushforward we get
Bern (^3 / 4 ) AND Bern (^3 / 4 ) (0) =Bern (^3 / 4 ) ⊗ Bern (^3 / 4 ) {(x, y) | x AND y = 0} =Bern (^3 / 4 ) ⊗ Bern (^3 / 4 ) (0, 0) + Bern (^3 / 4 ) ⊗ Bern (^3 / 4 ) (0, 1)+ Bern (^3 / 4 ) ⊗ Bern (^3 / 4 ) (1, 0)
=^7 16
It follows that Bern (^3 / 4 ) AND Bern (^3 / 4 ) (1) = 1 − Bern (^3 / 4 ) AND Bern (^3 / 4 ) (0) = 169 and so
Bern (^3 / 4 ) AND Bern (^3 / 4 ) = Bern (^9 / 16 )
Alternatively, we could have started with 1 and immediately gotten that
Bern (^3 / 4 ) AND Bern (^3 / 4 ) (1) =Bern (^3 / 4 ) ⊗ Bern (^3 / 4 ) {(x, y) | x AND y = 1} =Bern (^3 / 4 ) ⊗ Bern (^3 / 4 ) (1, 1)
=
More generally, Bern (p) AND Bern (p) = Bern
p^2
- The proof is by contradiction. Suppose that d is the product of two distributions, that is to say d = d 1 ⊗ d 2. Then d(1, 1) = d 1 ⊗ d 2 (1, 1) = d 1 (1)d 2 (1) = 0 which means that either d 1 (1) = 0 or d 2 (1) = 0. The first possibility implies that
d(1, 0) = d 1 ⊗ d 2 (1, 0) = d 1 (1)d 2 (0) = 0
which contradicts the fact that d(1, 0) = 3 / 8. The second possibility leads to a contradiction for the same reason. So d cannot be a product of two distributions.