Exercises and Solutions, Exercises of Applied Statistics

Revision exercise set for Applied Statistics, covering probability distributions, PDFs/CDFs, maximum likelihood estimation, descriptive statistics, hypothesis testing, goodness-of-fit tests, and joint distributions, with worked solutions and explanations.

Typology: Exercises

2022/2023

Uploaded on 12/16/2025

venkataramanan
venkataramanan 🇮🇳

1 document

1 / 6

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Applied Statistics (ECS764P) - Revision exercises
Fredrik Dahlqvist
14 December 2022
1. Basic definitions. Consider the density function given by
f(x) = (αx if 0x2
0else
For what value of αdoes fdefine a probability density function?
Using this value of αcompute the CDF of the distribution.
Still using this value of αcompute the probability mass of the interval [0,1] and the probability mass
of the interval [1
/2,3
/2].
Answer:
For f(t)to define a probability density function we need
Zf(t)dt = 1
This means that we need
αt2
2
2
0
=α4
2= 1
and thus α=1
/2.
By definition, the CDF is given by
F(t) = Zt
−∞
f(t)dt =Zt
0
t
2dt =
0t < 0
t2
40t2
1t > 2
The probability mass of [0,1] is just the CDF evaluate at 1, i.e. 1
4. The probability mass of [1
/2,3
/2]
is given by
Z3
/2
1
/2
f(t)dt = 1
which can be computed directly, or we can use F(3
/2)F(1
/2) = 1
2.
2. Basic definitions. Consider the CDF defined by
F(t) =
0if t < 0
3tif 0t1
/3
1else
Compute the corresponding PDF. Can you recognise this distribution?
Check that it does indeed define a probability density function
If 3twere replaced by 6tin the definition above (and everything else remained the same), would F(t)
be a CDF? Justify your answer in one sentence.
Answer:
1
pf3
pf4
pf5

Partial preview of the text

Download Exercises and Solutions and more Exercises Applied Statistics in PDF only on Docsity!

Applied Statistics (ECS764P) - Revision exercises

Fredrik Dahlqvist

14 December 2022

  1. Basic definitions. Consider the density function given by

f (x) =

αx if 0 ≤ x ≤ 2 0 else

  • For what value of α does f define a probability density function?
  • Using this value of α compute the CDF of the distribution.
  • Still using this value of α compute the probability mass of the interval [0, 1] and the probability mass of the interval [^1 / 2 , 3 / 2 ].

Answer:

  • For f (t) to define a probability density function we need ∫ f (t)dt = 1

This means that we need α t

2 2

2

0

= α^4 2

and thus α = 1 / 2.

  • By definition, the CDF is given by

F (t) =

∫ (^) t

−∞

f (t) dt =

∫ (^) t

0

t 2 dt^ =

0 t < 0 t^2 4 0 ≤^ t^ ≤^2 1 t > 2

  • The probability mass of [0, 1] is just the CDF evaluate at 1, i.e. 14. The probability mass of [^1 / 2 , 3 / 2 ] is given by (^) ∫ (^3) / 2

(^1) / 2

f (t)dt = 1

which can be computed directly, or we can use F (^3 / 2 ) − F (^1 / 2 ) = 12.

  1. Basic definitions. Consider the CDF defined by

F (t) =

0 if t < 0 3 t if 0 ≤ t ≤ 1 / 3 1 else

  • Compute the corresponding PDF. Can you recognise this distribution?
  • Check that it does indeed define a probability density function
  • If 3 t were replaced by 6 t in the definition above (and everything else remained the same), would F (t) be a CDF? Justify your answer in one sentence.

Answer:

  • By using the fundamental theorem of calculus

f (t) =

d dt

F (t) =

0 t < 0 3 0 ≤ t ≤ 1 / 3 0 t > 3

This is the uniform distribution on [0, 1 / 3 ].

  • It is easy to see that (^) ∫ f (t) dt =

0

3 dx = 1

so f (t) does indeed define a distribution.

  • It would NOT define a CDF since it would exceed 1 beyond x = 1 / 6 and a CDF can never exceed one since it measures a probability.
  1. Maximum Likelihood Estimators. The exponential distribution with parameter λ is a probability measure on the positive real line whose pdf is given by

f (x) = λe−λx, λ > 0

Compute the Maximum Likelihood Estimator for the parameter λ. Briefly explain the different steps of your computation. Answer: Suppose we observe n samples (x 1 ,... , xn) drawn from Exp (λ). We want to find the parameter λ which maximises the likelihood of our observations, that is to say that maximises

∏^ n

i=

f (xi) =

∏^ n

i=

λe−λxi^.

We start by taking the logarithm in order to turn this product into a sum.

ln

∏^ n

i=

λe−λxi^ =

∑^ n

i=

(ln λ − λxi) = n ln λ − λ

∑^ n

i=

xi

To find the λ which maximises this function we take the derivative with respect to λ, set it to zero, and solve for λ

d dλ

n ln λ − λ

∑^ n

i=

xi

n λ −

∑^ n

i=

xi = 0

which means that ˆλ = ∑nn i=1 xi which is simply the inverse of the sample mean.

  1. Descriptive statistics.
    • State a condition on the pdf which guarantees that a probability distribution has no skew.
    • State a condition which guarantees that a set of samples x 1 ,... , xn has sample kurtosis equal to 0
    • Give a set of three samples x 1 , x 2 , x 3 such that (i) their sample mean in 4, their median is 3 and their range is 7
    • Compute the mean, the standard deviation and the skew of the distribution on { 1 , 2 , 3 } whose pdf/pmf is given by f (1) = 1 / 4 , f (2) = 1 / 4 , f (3) = 1 / 2

Answer:

  • If the pdf is an even function (symmetric around 0), then the distribution will have no skew. More generally, if a pdf is symmetric around the mean of the distribution – that is to say if f (μ + x) = f (μ − x) – then it will have no skew.
  1. Goodness of fit. Suppose each individual in a population can be assigned to one of four classes A, B, C or D. We want to test the hypothesis that the probability distribution of these classes is

d(A) =^1 2

, d(B) =^1 4

, d(C) = d(D) =^1 8 After sampling 1,000 individuals we get 465 individuals in class A, 260 individuals in class B, 180 indi- viduals in class C and the remaining individuals in class D.

  • Write down the empirical probability measure.
  • Compute the (Pearson) distance between the empirical probability measure and the probability measure under H 0.
  • What is the approximate distribution of this distance (use Pearson’s theorem, as seen in the lectures)?
  • Using α = 0. 99 can you reject H 0? Answer:
  • The empirical probability measure is given by

demp = 465 1000

δA +^260 1000

δB +^180 1000

δC + 95 1000

δD

This probability measure gives us the observed frequencies as probability masses. For example

demp(A) = 465 1000

δA(A) +^260 1000

δB (A) +^180 1000

δC (A) + 95 1000

δD (A) = 465 1000 Since δA(x) = 1 when x = A and 0 else.

  • The Pearson distance between d and demp is given by

dist(d, demp) =

2 −^

4 −^

8 −^

8 −^

  • According to the theorem discussed in the lectures the distribution of n · dist(d, demp) approaches the χ^2 (3) distribution with 3 = 4 − 1 degrees of freedom as n → ∞. That means that the distribution of dist(d, demp) approaches the distribution (^1) n χ^2 (3) as n → ∞. We could compute this distribution (it is known to be a Gamma distribution) and use it to build a goodness-of-fit test, but in general the Pearson goodness-of-fit test is performed by using n · dist(d, demp) and the χ^2 -distribution directly.
  • The p-value (that is to say the probability under H 0 of getting an even larger value of n·dist(d, demp)) is given by χ^2 (3) ([1000 · 0. 03425 , +∞)) = 1. 75 e − 07 < 1 − α = 0. 01 The (approximate) probability of the observations under H 0 are thus sufficiently unlikely to reject H 0. Note that the values above (both for the statistic 1000 · 0 .03425 = 34. 25 and the p-value 1. 75 e − 07 ) are exactly what you get if you call the python function

scipy.stats.chisquare([ 465 , 260 , 180 , 95 ], [ 500 , 250 , 125 , 125 ])

which perform Pearson’s χ^2 -test.

  1. Joint distributions.
    • Find the Bernoulli distribution Bern (p) on { 0 , 1 } such that the product Bern (p) ⊗ Bern (p) is given by

Bern (p) ⊗ Bern (p) (0, 0) =

Bern (p) ⊗ Bern (p) (0, 1) = 3 16 Bern (p) ⊗ Bern (p) (1, 0) = 3 16 Bern (p) ⊗ Bern (p) (1, 1) = 9 16

  • What is the covariance of this joint distribution?
  • Compute the marginals of this joint distribution.
  • Consider the map AND : { 0 , 1 } × { 0 , 1 } → { 0 , 1 } given by the usual boolean AND operation ( 0 AND 0 = 0 AND 1 = 1 AND 0 = 0, 1 AND 1 = 1). Compute the pushforward of Bern (p) ⊗ Bern (p) under the map AND. (Simply apply the definition, this is very similar to an independent +, but it uses AND instead ).
  • Consider now the distribution d on { 0 , 1 } × { 0 , 1 } given by

d(0, 0) =

d(0, 1) =

d(1, 0) =

d(1, 1) = 0

Show that d is a joint probability distribution but that it is not a product measure.

  • Compute the marginals of d as well as their means and variances.
  • Compute the covariance and the correlation of d. What can you deduce from the sign of the correlation coefficient? Is this observation reflected in the definition of d?

Answer:

  • Observe that by definition

Bern (p) ⊗ Bern (p) (0, 0) = Bern (p) (0) · Bern (p) (0) = (Bern (p) (0))^2 =

and thus Bern (p) (0) =

1 16 =^

1

  1. It immediately follows that^ Bern (p) (1) =^

3 4 and thus^ p^ =^

3 4 since p is the probability of success (that is to say the probability mass of { 1 }). We could have used the same argument using Bern (p) ⊗ Bern (p) (1, 1) = 169 to directly get the same answer.

  • The covariance of a product of distributions is always equal to zero (see lectures).
  • The marginals of a product of distributions are given by the components of the product, in this case both marginals are given by Bern (^3 / 4 ).
  • By definition of the pushforward we get

Bern (^3 / 4 ) AND Bern (^3 / 4 ) (0) =Bern (^3 / 4 ) ⊗ Bern (^3 / 4 ) {(x, y) | x AND y = 0} =Bern (^3 / 4 ) ⊗ Bern (^3 / 4 ) (0, 0) + Bern (^3 / 4 ) ⊗ Bern (^3 / 4 ) (0, 1)+ Bern (^3 / 4 ) ⊗ Bern (^3 / 4 ) (1, 0)

=^7 16

It follows that Bern (^3 / 4 ) AND Bern (^3 / 4 ) (1) = 1 − Bern (^3 / 4 ) AND Bern (^3 / 4 ) (0) = 169 and so

Bern (^3 / 4 ) AND Bern (^3 / 4 ) = Bern (^9 / 16 )

Alternatively, we could have started with 1 and immediately gotten that

Bern (^3 / 4 ) AND Bern (^3 / 4 ) (1) =Bern (^3 / 4 ) ⊗ Bern (^3 / 4 ) {(x, y) | x AND y = 1} =Bern (^3 / 4 ) ⊗ Bern (^3 / 4 ) (1, 1)

=

More generally, Bern (p) AND Bern (p) = Bern

p^2

  • The proof is by contradiction. Suppose that d is the product of two distributions, that is to say d = d 1 ⊗ d 2. Then d(1, 1) = d 1 ⊗ d 2 (1, 1) = d 1 (1)d 2 (1) = 0 which means that either d 1 (1) = 0 or d 2 (1) = 0. The first possibility implies that

d(1, 0) = d 1 ⊗ d 2 (1, 0) = d 1 (1)d 2 (0) = 0

which contradicts the fact that d(1, 0) = 3 / 8. The second possibility leads to a contradiction for the same reason. So d cannot be a product of two distributions.