













Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The concept of independent and mutually independent random variables. It covers the definition, examples, and implications of independent random variables, as well as the concept of mutual independence. The document also explores the relationship between the joint distribution and independence, and provides examples of independent and mutually independent random variables with normal, exponential, and poisson distributions.
Typology: Study notes
1 / 21
This page cannot be seen from the preview
Don't miss anything!














Two random variables are independent if
pX,Y (x, y) = pX (x)pY (y) (1)
or
fX,Y (x, y) = fX (x)fY (y) (2)
This is directly analogous to the general probability rules. The conditional probability mass and density functions, are then just:
pX|Y (X|Y ) = pX,Y (x, y) pY (y)
pX (x)pY (y) pY (y) = pX (x)
fX|Y (X|Y ) = fX,Y (x, y) fY (y)
fX (x)fY (y) fY (y) = fX (x)
Dependence is any violation of this condition.
Let fX 1 ,X 2 (x 1 , x 2 ) = x 1 + x 2 · (^1) { 0 <x 1 < 1 , 0 <x 2 < 1 } Are x 1 and x 2 independent?
Well...:
f 1 (x 1 ) =
f (x 1 , x 2 )dx 2 =
0
x 1 + x 2 dx 2
= x + 1/ 2 · (^1) { 0 ≤x 1 < 1 }
f 2 (x 2 ) =
f (x 1 , x 2 )dx 1 =
0
x 1 + x 2 dx 1
= 1/2 + x 2 · (^1) { 0 ≤x 2 < 1 }
But: x 1 + x 2 6 = (x 1 + 1/2)(x 2 + 1/2). The answer is no.
In general, for X 1 , X 2 ∼ f (x 1 , x 2 ), if f (x 1 , x 2 ) = g(x 1 )h(x 2 ) this implies that X 1 is indepen- dent of X 2
This is not a formal “proof”^1 : We know that we can always write a joint pdf as product of a conditional and marginal...
fX 1 ,X 2 (x 1 , x 2 ) = fX 2 |X 1 (x 2 |x 1 )fX 1 (x 1 )
If the functional form of fX 2 |X 1 (x 2 |x 1 ) does not include (depend) x 1 , say fX 2 |X 1 (x 2 |x 1 ) = h(x 2 ) then - integrate both sides over x 1 ...
∫ fX 1 ,X 2 (x 1 , x 2 )dx 1 =
fX 2 |X 1 (x 2 |x 1 )fX 1 (x 1 )dx 1
which yields
fX 2 (x 2 ) = h(x 2 )
fX 1 (x 1 )dx 1
and of course
fX 2 (x 2 ) = h(x 2 ) = fX 2 |X 1 (x 2 |x 1 ) (^1) But then, what is these days...?
P(X 1 ≤ x 1 , ..., Xn ≤ xn) =
∏^ n
i=
P(Xi ≤ xi) (3)
Recall our result for expectations of sums of random variables
E(X 1 + · · · + Xn) =
∑^ n
i=
E(Xi)
This result holds regardless of the dependence structure of the joint distribution f (x 1 , ..., xn).
For a product of random variables
E(X 1 · · · Xn) =
∏^ n
i=
E(Xi)
only holds for independence.
Asserting X 1 , ..., Xn are mutually independent implies that any Xi is independent of Xj 2. This strongest condition requires that the joint distribution be completely factorable
f (x 1 , ..., xn) =
∏^ n
i=
fi(xi)
But: bivariate independence — Xi independent of Xj for any i, j — does not imply mutual independence.
Example:
Let
f (x 1 , x 2 , x 3 ) = 1/ 4 · (^1) {(1, 0 ,0),(0, 1 ,0),(0, 0 ,1),(1, 1 ,1)} (^2) as well as independence of subsets of greater length
Which implies
fij (xi, xj ) = 1/ 4 · (^1) {(0,0),(1,0),(0,1),(1,1)}
and
fi(xi) = 1/ 2 · (^1) {xi=0, 1 }
Thus fi(xi)fj (xj ) = 1/4 — any Xi and Xj are independent, BUT: fi(xi)fj (xj )fk(xk) = 1 / 8 6 = 1/4. So there is not complete mutual independence.
3 Dist. for Sums of independent Random Variables
Let X ⊥ Y. Calculate the distribution of Z = X + Y
FX+Y (z) = P(X + Y ≤ z)
=
x+y≤z
fX (x)fY (y)dxdy
by the independence of X and Y
−∞
∫ (^) z−y
−∞
fX (x)fY (y)dxdy
Setting X = Z − Y
∫ (^) ∞
−∞
FX (z − y)fy(y)dy (4)
yields the equation for the cumulative density function (cdf) of Z — this is the general form for the cdf of the sum of two independent random variables.
Taking a derivative with respect to z yields
dFZ (z) = d dz
∞
FX (z − y)fY (y)dy =
∞
fX (z − y)fY (y)dy (5)
4 Joint Distributions of Functions of a Random Vari-
able
We’ve seen the univariate case^4 ; let’s state it explicitly.
Take X ∼ FX , FX known. What is the distribution fY if Y = g(X), with g some invertible bijection from the space of X to the space of Y.
Well...:
FY = P(Y ≤ y) = P(g(X) ≤ y) = P(X ≤ g−^1 (y)) = FX (g−^1 (y))
Thus:
dFX = d(FX (g(−1)(y))) = fX (g−^1 (y)) · d[g−^1 (y)]
Let X ∼ fX = 2x · (^10) <x< 1. Let Y = 8X^3. So Y ∈ [0, 8]. What we do in general is to use the known distribution of X to generate the distribution of Y.
The event {a < Y < b} on the space of Y (the image of the transformation) is the same as the event {^12 a^1 /^3 < X < 12 b^1 /^3 } on the space of X (the pre-image of the transformation). So...
P({a < Y < b}) = P({
a^1 /^3 < X <
b^1 /^3 })
Which is just the integral on those limits of the pdf of X...
∫ (^) b 1 / (^3) / 2
a^1 /^3 / 2
2 xdx
We change the variable from X to Y in the integral, to get a statement using the pdf of X but in terms of Y : x = g−^1 (x) = 12 y^1 /^3 which implies dxdy = (^) dyd g−^1 (y) = (^6) y^12 / 3.^5
(^4) refer past lectures where we investigated the distribution of Y = X (^2) when X ∼ U (− 1 , 1) (^5) Aside: The derivative of the inverse of a function is f : d[f − (^1) (x)] = (^) f ′(f − (^11) (x))
This yields:
P(a < Y < b) =
∫ (^) b
a
y^1 /^3 2
6 y^2 /^3 )dy
Here, the pdf fY (y) = P(Y ∈ y ± ) is the integrand: fY (y) = (^6) y^11 / 3 · (^10) <y< 8.
More generally:
P(a < Y < b) =
∫ (^) a
b
f (g−^1 (y))|[ d dy g−^1 ]|dy
so
fY (y) = f (g−^1 (y))|[ d dy g−^1 ]|
Remark: 1) We found the pdf, fY , by changing the variable on the definite integral for the probability {g−^1 (a) < X < g−^1 (b)}, with a, b on the space of Y. 2) We really only needed: i) the conditions for which fY > 0 and ii) the integrand for P(a < Y < b). These constraints require |[ (^) dyd g−^1 ]| which is always non-negative.
So there are two conditions for the change of variable formula: 1) Verify that Y = g(X) is a bijection and 2) take the absolute value of the rate of change of the transformation. This two rules yield the univariate formula for the pdf of a function g(X): fY (y) = f (g−^1 (y))|[ (^) dyd g−^1 ]|.
In general, we call |[ (^) dyd g−^1 ]| the absolute value of the Jacobian — its the rate of change of the pdf under the transformation. The absolute value is necessary so that fY > 0.
Let fX (x) = (^1) { 0 <x< 1 } and Y = g(X) = − 2 ln(X). The inverse is g−^1 (y) = e−y/^2. Here d[g−^1 (y)] = −^12 e−y/^2 < 0, so it is important to remember to take the absolute value.
Thus:
fY (y) = fX (g−^1 (y))|J|
= fX (e−y/^2 )| −
e−y/^2 |
Let X 1 , X 2 ∼ fX 1 ,X 2 · (^1) { 0 <x 1 ,x 2 < 1 } and let Y 1 = g 1 (X 1 , X 2 ) = X 1 + X 2 and Y 2 = g 2 (X 1 , X 2 ) = X 1 − X 2.
This implies g− 1 1 = X 1 = Y^1 + 2 Y^2 and g− 2 1 = X 2 = Y^1 − 2 Y^2
Figure 1 Illustration of transformation Y 1 = g 1 (X 1 , X 2 ) = X 1 + X 2 and Y 2 = g 2 (X 1 , X 2 ) = X 1 − X 2. Note the boundaries of figures A and B
Then
|Jk| =
1 2
1 1 2 2 −
1 2
Thus: fY 1 ,Y 2 = 12 · (^1) {(y 1 ,y 2 )∈B}
where B = { 0 < y 1 < 2 , − 1 < y 2 < 1 } with the constraints illustrated in figure 1.
The extension to higher dimensions is natural, direct.
Let X 1 , X 2 ∼ Exp(λ), thus (x 1 , x 2 ) ∈ [0, ∞) × [0, ∞) and fXi = λe−λXi^.
Let the transformation be Y 1 = (^) X 1 X+^1 X 2 and Y 2 = X 1 + X 2 , thus (y 1 , y 2 ) ∈ (0, 1) × [0, ∞).
Then x 1 = g− 1 1 (y 1 , y 2 ) = y 1 · y 2 and x 2 = g− 2 1 (y 1 , y 2 ) = y 2 (1 − y 1 ). And
∣∣ y^2 y^1 −y 2 1 − y 1
If λ = 1, then: fX 1 ,X 2 = e−(x^1 +x^2 )^ and thus fY 1 ,Y 2 (y 1 , y 2 ) = y 2 · e−y^2 · (^1) { 0 <y 2 <∞} · (^1) { 0 <y 1 < 1 }.
This implies Y 1 ⊥ Y 2 ; Y 2 ∼ Γ(2, 1) and Y 1 ∼ U (0, 1).
6 Order Statistics
Now we come again to statistics^6 in this class; moving focus from the random processes (variables) which generate data, to drawing inference on a supposed process.
The order statistics are ordered values of a random process. Here, the order statistics are not technically observed values: so we write them with big X’s instead of little x’s.
Let X 1 , ..., Xn be independent and identically distributed random variables — the X’s are ‘i.i.d’ we say. The ordered values, written X(1), ..., X(n), with:
X(1) ≡ min(X 1 , ..., Xn), X(n) ≡ max(X 1 , ..., Xn)
and X(k) the kth largest value in X 1 , ..., Xn. That is, for X(1), ...X(2n+1), X(n+1) is the median.
7 Distributions for Order Statistics
As a first example, let X 1 , X 2 ∼ fX 1 ,X 2 (x 1 , x 2 ) with X 1 ⊥ X 2.
(^6) A statistic is any function of observed data. In our language, in this class, observed data are samples x 1 , ..., xn from a random process X
8 Distributions of Order Statistics
Starting with the pdf for the maximum X(n)
fX(n) (x) = P(1 of (X 1 , ...Xn) ∈ x ± , all else < x)
= fX · [
∫ (^) x
∞
fX (t)]n−^1 · n
= n · fX [FX ]n−^1
The pdf for the minimum X(1) can be found similarly:
fX(1) = P(1 of (X 1 , ...Xn)in x ± , all else > x)
= fX · [
x
fX (t)]n−^1 · n
n · fX [1 − FX ]n−^1
The joint distribution for the entire order statistics can be ‘derived’ heuristically from what we know about pdf’s of transformations.
Let X 1 , ...Xn ∼ fX , i.i.d. Then set Yk = X(k), the kth order statistic. Then Yk is a trans- formation: Yk = g(X 1 , ...Xn). It is easy to see that the determinant of the Jacobian — the matrix of partial derivatives of the form (( ∂x ∂xji ))i,j=1..n — will be 1. For n = 3, for instance, say y 1 = x∗ 1 , y 2 = x∗ 2 , y 3 = x∗ 3 , one possible outcome. The Jacobian is 1 on the diagonal and 0 elsewhere, yielding a determinant of 1. But there are 3! possible arrangements, so the pdf is fX(1),X(2),X(3) (x(1), x(2), x(3)) = 3! · fX (x(1)) · fX (x(2)) · fX (x(3)). In general the result is
fX(1),...,X(n) = n! · fX (x(1)) · · · fX x(n)
With the full joint pdf you can compute marginal densities for any one or several of the X(1), ..., X(n), by integrating out the remaining densities. The pdf for any X(j):
fX(j) = n! (n − j)!(j − 1)! · [FX (x(j)]j−^1 · [1 − FX (x(j))]n−j^ · fX (x(j))
The pdf for any X(i), X(j) with i < j is:
fX(i),X(j) = n! (i − 1)!(j − i − 1)!(n − j)! ·[FX (x(i))]i−^1 ·[FX (x(j)) − FX (x(i))]j−i−^1 ·[1 − FX (x(j))]n−j ·fX (x(i)) · fX (x(j))
There is an attractive heuristic for these pdfs. The probability fX(i),X(j) is the multinomial probability: [FX (x(i))]i−^1 is the probability that i − 1 values are less than the ith order statistic x(i); [FX (x(j)) − FX (x(i))]j−i−^1 is the probability of the values between the ith and jth order statistic; [1 − FX (x(j))]n−j^ is the probability of the values greater than the jth order statistic.
We have been calling ˜x the median, i.e. the value such that FX x˜ = 12. For X 1 , ..., X 2 n+ random variable, the median is X(n+1). The distribution is
f (^) X˜ = fX(n+1) = (2n + 1)! (n + 1 − 1)!(2n + 1 − (n + 1))! ·[FX (x(n+1))]n+1−^1 ·[1−FX (x(n+1))]^2 n+1−(n+1)·fX (x(n+1))
9 Exchangeable Random Variables
Random variables are called exchangeable if the probability distribution is invariant to per- mutations, i.e. if
P(Xi 1 < x 1 , ..., Xin < xn) = P(X 1 < x 1 , ...Xn < xn)
for any permutation i 1 , ...in.
The uniqueness of the MGF (in this class^7 ) yields that fX (x) = 10 x
Say X ∼ Ber(p). Then
E(etX^ ) = p · et^ + (1 − p) · e^0 = pet^ + (1 − p)
Say X ∼ Bin(n, p). Then
E(etX^ ) =
etK^ Ckn pk(1 − p)n−k
=
Ckn (pet)k(1 − p)n−k = (pet^ + (1 − p))n
Say X ∼ N (0, 1). Then
E[etX^ ] =
2 π
R
etxe
x 22 dx
2 π
e−^
(x−t)^2 2 +^ t
2 (^2) dx
= e t 22 ·
2 π
e−^
(x− 2 t )^2 dx
= e t 22 (^7) I’m going to stop saying that now. The reason that the MGF does not always uniquely determine the pdf relates to the condition on the integrability of the E[etX^ ], the existence of an h > 0. A generalization of the MGF, the so-called characteristic function E(eitX^ ), is complete for the pdf.
Say X ∼ Exp(λ). Then
E[etX^ ] =
R+
etxλe−λxdx
= λ
R+
e−(λ−t)xdx
λ λ − t
1 − βt
with β = (^) λ^1
The eponymous result is that we can obtain moments directly from MX. Take the first derivative of MX (t)
d dt MX (t) = M ′ X (t) =
∑xe txfX^ , xetxpX ,
This implies that M ′ (t)|t=0 = E(X) = μ. Take another derivative
d^2 dt^2
MX (t) = M (^) X′′ (t) =
∑x 2 etxfX^ , x^2 etxpX ,
which implies that M ′′ (t)|t=0 = E(X^2 ). Thus σ^2 = M ′′ (0) − [M ′ (0)]^2.
In general: M (m)(t)|t=0 = E(Xm); the mth derivative of the MGF is the mth moment of X.
Take Note!:
etX^ = 1 + tX +
(tX)^2 + · · · +
k! (tX)k^ + · · ·
from the Taylor expansion of etX^ about 0. Taking an expected value yields
E(etX^ ) = E[1 + tX +
(tX)^2 + · · · 1 k!(tX)k^ + · · · ]
If X ∼ MX and Y ∼ MY with X ⊥ Y , then...
Mc 1 X+c 2 Y = MX (c 1 tX ) · MY (c 2 tY )
and in general for random variables X 1 , ...Xn and constants c 1 , ..., cn
MPni=1 ciXi (t) =
∏^ n
i
MXi (citXi )
If X 1 , ...Xn ∼ dFX , then just write
MPni=1 ciXi (t) =
∏^ n
i
MXi (citX )
or
MPni=1 ciXi (t) =
∏^ n
i
MXi (cit)
12.3.1 Example
Say Y =
∑α i=1 Xi^ with^ Xi^ ∼^ Exp(λ^ =^ 1 β ), then
∏^ α
i=
1 − βt ]α
This is the MGF for a Γ(α, β) random variable.
13 Joint Dist of X and S^2
Let X ∼ μ, σ^2 — with some unspecified distribution. Let X = (^) n^1
∑n i=1 Xi^ and^ S
1 n− 1
∑n i=1(Xi^ −^ X)^2.
We know from an earlier work that E[X] = μ and V ar(X) = σ n^2. By the Central Limit Theorem we know that limn Xn ∼ N (μ, σ 2 n )
But what about the distribution of S^2? It turns out that we can write
∑ (Xi − μ)^2 =
(Xi − X + X − μ)^2
=
(Xi − X)^2 +
(X − μ)^2
since 2(X − μ)
(Xi − X) = 0.
Write W =
(Xiσ− μ)^2 and W 1 = (X−μ)
2 σ^2 /n and^ W^2 =^
P(Xi−X) 2 σ^2.^ Then^ W^ =^ W^1 +^ W^2 and W 1 ⊥ W 2.
So E[etW^ ] = E[etW^1 ]E[etW^2 ] which is just MW = MW 1 · Mw 2 , by independence.
But we know that W ∼ χ^2 n because it is the distribution of the sum of squared deviations. We also know that W 1 ∼ χ^21 , since X ∼ N (μ, σ^2 /n) and we standardize it to get W 1. That implies that W 2 ∼ χ^2 n− 1 , as it is independent, and its MGF must be
( (^1) −^12 t )n/^2 ( (^1) −^12 t )^1 /^2
= (
1 − 2 t )(n−1)/^2
which is χ^2 n− 1. Just one last bit of algebra:
(Xi − X)^2 /(n − 1) σ^2 /(n − 1)
= (n − 1)
σ^2
So the distribution of (n − 1)S σ^22 is χ^2 n− 1.