Independence and Mutual Independence of Random Variables - Prof. Kobi Abayomi, Study notes of Data Analysis & Statistical Methods

The concept of independent and mutually independent random variables. It covers the definition, examples, and implications of independent random variables, as well as the concept of mutual independence. The document also explores the relationship between the joint distribution and independence, and provides examples of independent and mutually independent random variables with normal, exponential, and poisson distributions.

Typology: Study notes

Pre 2010

Uploaded on 08/04/2009

koofers-user-x96
koofers-user-x96 🇺🇸

10 documents

1 / 21

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
ISYE 2028 A and B
Lecture 8
Kobi Abayomi
March 25, 2009
1 Independent Random Variables
Two random variables are independent if
pX,Y (x, y) = pX(x)pY(y) (1)
or
fX,Y (x, y) = fX(x)fY(y) (2)
This is directly analogous to the general probability rules. The conditional probability mass
and density functions, are then just:
pX|Y(X|Y) = pX,Y (x, y)
pY(y)=pX(x)pY(y)
pY(y)=pX(x)
fX|Y(X|Y) = fX,Y (x, y)
fY(y)=fX(x)fY(y)
fY(y)=fX(x)
Dependence is any violation of this condition.
1.1 Example
Let fX1,X2(x1, x2) = x1+x2·1{0<x1<1,0<x2<1}Are x1and x2independent?
1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15

Partial preview of the text

Download Independence and Mutual Independence of Random Variables - Prof. Kobi Abayomi and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity!

ISYE 2028 A and B

Lecture 8

Kobi Abayomi

March 25, 2009

1 Independent Random Variables

Two random variables are independent if

pX,Y (x, y) = pX (x)pY (y) (1)

or

fX,Y (x, y) = fX (x)fY (y) (2)

This is directly analogous to the general probability rules. The conditional probability mass and density functions, are then just:

pX|Y (X|Y ) = pX,Y (x, y) pY (y)

pX (x)pY (y) pY (y) = pX (x)

fX|Y (X|Y ) = fX,Y (x, y) fY (y)

fX (x)fY (y) fY (y) = fX (x)

Dependence is any violation of this condition.

1.1 Example

Let fX 1 ,X 2 (x 1 , x 2 ) = x 1 + x 2 · (^1) { 0 <x 1 < 1 , 0 <x 2 < 1 } Are x 1 and x 2 independent?

Well...:

f 1 (x 1 ) =

f (x 1 , x 2 )dx 2 =

0

x 1 + x 2 dx 2

= x + 1/ 2 · (^1) { 0 ≤x 1 < 1 }

f 2 (x 2 ) =

f (x 1 , x 2 )dx 1 =

0

x 1 + x 2 dx 1

= 1/2 + x 2 · (^1) { 0 ≤x 2 < 1 }

But: x 1 + x 2 6 = (x 1 + 1/2)(x 2 + 1/2). The answer is no.

1.2 Independence is factorization of the pdf

In general, for X 1 , X 2 ∼ f (x 1 , x 2 ), if f (x 1 , x 2 ) = g(x 1 )h(x 2 ) this implies that X 1 is indepen- dent of X 2

This is not a formal “proof”^1 : We know that we can always write a joint pdf as product of a conditional and marginal...

fX 1 ,X 2 (x 1 , x 2 ) = fX 2 |X 1 (x 2 |x 1 )fX 1 (x 1 )

If the functional form of fX 2 |X 1 (x 2 |x 1 ) does not include (depend) x 1 , say fX 2 |X 1 (x 2 |x 1 ) = h(x 2 ) then - integrate both sides over x 1 ...

∫ fX 1 ,X 2 (x 1 , x 2 )dx 1 =

fX 2 |X 1 (x 2 |x 1 )fX 1 (x 1 )dx 1

which yields

fX 2 (x 2 ) = h(x 2 )

fX 1 (x 1 )dx 1

and of course

fX 2 (x 2 ) = h(x 2 ) = fX 2 |X 1 (x 2 |x 1 ) (^1) But then, what is these days...?

P(X 1 ≤ x 1 , ..., Xn ≤ xn) =

∏^ n

i=

P(Xi ≤ xi) (3)

2.1 Expectations of independent random variables

Recall our result for expectations of sums of random variables

E(X 1 + · · · + Xn) =

∑^ n

i=

E(Xi)

This result holds regardless of the dependence structure of the joint distribution f (x 1 , ..., xn).

For a product of random variables

E(X 1 · · · Xn) =

∏^ n

i=

E(Xi)

only holds for independence.

2.2 Bivariate independence does not imply mutual independence

Asserting X 1 , ..., Xn are mutually independent implies that any Xi is independent of Xj 2. This strongest condition requires that the joint distribution be completely factorable

f (x 1 , ..., xn) =

∏^ n

i=

fi(xi)

But: bivariate independence — Xi independent of Xj for any i, j — does not imply mutual independence.

Example:

Let

f (x 1 , x 2 , x 3 ) = 1/ 4 · (^1) {(1, 0 ,0),(0, 1 ,0),(0, 0 ,1),(1, 1 ,1)} (^2) as well as independence of subsets of greater length

Which implies

fij (xi, xj ) = 1/ 4 · (^1) {(0,0),(1,0),(0,1),(1,1)}

and

fi(xi) = 1/ 2 · (^1) {xi=0, 1 }

Thus fi(xi)fj (xj ) = 1/4 — any Xi and Xj are independent, BUT: fi(xi)fj (xj )fk(xk) = 1 / 8 6 = 1/4. So there is not complete mutual independence.

3 Dist. for Sums of independent Random Variables

Let X ⊥ Y. Calculate the distribution of Z = X + Y

FX+Y (z) = P(X + Y ≤ z)

=

x+y≤z

fX (x)fY (y)dxdy

by the independence of X and Y

−∞

∫ (^) z−y

−∞

fX (x)fY (y)dxdy

Setting X = Z − Y

∫ (^) ∞

−∞

FX (z − y)fy(y)dy (4)

yields the equation for the cumulative density function (cdf) of Z — this is the general form for the cdf of the sum of two independent random variables.

Taking a derivative with respect to z yields

dFZ (z) = d dz

FX (z − y)fY (y)dy =

fX (z − y)fY (y)dy (5)

4 Joint Distributions of Functions of a Random Vari-

able

We’ve seen the univariate case^4 ; let’s state it explicitly.

Take X ∼ FX , FX known. What is the distribution fY if Y = g(X), with g some invertible bijection from the space of X to the space of Y.

Well...:

FY = P(Y ≤ y) = P(g(X) ≤ y) = P(X ≤ g−^1 (y)) = FX (g−^1 (y))

Thus:

dFX = d(FX (g(−1)(y))) = fX (g−^1 (y)) · d[g−^1 (y)]

4.1 Example

Let X ∼ fX = 2x · (^10) <x< 1. Let Y = 8X^3. So Y ∈ [0, 8]. What we do in general is to use the known distribution of X to generate the distribution of Y.

The event {a < Y < b} on the space of Y (the image of the transformation) is the same as the event {^12 a^1 /^3 < X < 12 b^1 /^3 } on the space of X (the pre-image of the transformation). So...

P({a < Y < b}) = P({

a^1 /^3 < X <

b^1 /^3 })

Which is just the integral on those limits of the pdf of X...

∫ (^) b 1 / (^3) / 2

a^1 /^3 / 2

2 xdx

We change the variable from X to Y in the integral, to get a statement using the pdf of X but in terms of Y : x = g−^1 (x) = 12 y^1 /^3 which implies dxdy = (^) dyd g−^1 (y) = (^6) y^12 / 3.^5

(^4) refer past lectures where we investigated the distribution of Y = X (^2) when X ∼ U (− 1 , 1) (^5) Aside: The derivative of the inverse of a function is f : d[f − (^1) (x)] = (^) f ′(f − (^11) (x))

This yields:

P(a < Y < b) =

∫ (^) b

a

y^1 /^3 2

6 y^2 /^3 )dy

Here, the pdf fY (y) = P(Y ∈ y ± ) is the integrand: fY (y) = (^6) y^11 / 3 · (^10) <y< 8.

More generally:

P(a < Y < b) =

∫ (^) a

b

f (g−^1 (y))|[ d dy g−^1 ]|dy

so

fY (y) = f (g−^1 (y))|[ d dy g−^1 ]|

Remark: 1) We found the pdf, fY , by changing the variable on the definite integral for the probability {g−^1 (a) < X < g−^1 (b)}, with a, b on the space of Y. 2) We really only needed: i) the conditions for which fY > 0 and ii) the integrand for P(a < Y < b). These constraints require |[ (^) dyd g−^1 ]| which is always non-negative.

So there are two conditions for the change of variable formula: 1) Verify that Y = g(X) is a bijection and 2) take the absolute value of the rate of change of the transformation. This two rules yield the univariate formula for the pdf of a function g(X): fY (y) = f (g−^1 (y))|[ (^) dyd g−^1 ]|.

In general, we call |[ (^) dyd g−^1 ]| the absolute value of the Jacobian — its the rate of change of the pdf under the transformation. The absolute value is necessary so that fY > 0.

4.2 Example

Let fX (x) = (^1) { 0 <x< 1 } and Y = g(X) = − 2 ln(X). The inverse is g−^1 (y) = e−y/^2. Here d[g−^1 (y)] = −^12 e−y/^2 < 0, so it is important to remember to take the absolute value.

Thus:

fY (y) = fX (g−^1 (y))|J|

= fX (e−y/^2 )| −

e−y/^2 |

5.1 Example

Let X 1 , X 2 ∼ fX 1 ,X 2 · (^1) { 0 <x 1 ,x 2 < 1 } and let Y 1 = g 1 (X 1 , X 2 ) = X 1 + X 2 and Y 2 = g 2 (X 1 , X 2 ) = X 1 − X 2.

This implies g− 1 1 = X 1 = Y^1 + 2 Y^2 and g− 2 1 = X 2 = Y^1 − 2 Y^2

Figure 1 Illustration of transformation Y 1 = g 1 (X 1 , X 2 ) = X 1 + X 2 and Y 2 = g 2 (X 1 , X 2 ) = X 1 − X 2. Note the boundaries of figures A and B

Then

|Jk| =

1 2

1 1 2 2 −

1 2

Thus: fY 1 ,Y 2 = 12 · (^1) {(y 1 ,y 2 )∈B}

where B = { 0 < y 1 < 2 , − 1 < y 2 < 1 } with the constraints illustrated in figure 1.

The extension to higher dimensions is natural, direct.

Let X 1 , X 2 ∼ Exp(λ), thus (x 1 , x 2 ) ∈ [0, ∞) × [0, ∞) and fXi = λe−λXi^.

Let the transformation be Y 1 = (^) X 1 X+^1 X 2 and Y 2 = X 1 + X 2 , thus (y 1 , y 2 ) ∈ (0, 1) × [0, ∞).

Then x 1 = g− 1 1 (y 1 , y 2 ) = y 1 · y 2 and x 2 = g− 2 1 (y 1 , y 2 ) = y 2 (1 − y 1 ). And

|J| =

∣∣ y^2 y^1 −y 2 1 − y 1

If λ = 1, then: fX 1 ,X 2 = e−(x^1 +x^2 )^ and thus fY 1 ,Y 2 (y 1 , y 2 ) = y 2 · e−y^2 · (^1) { 0 <y 2 <∞} · (^1) { 0 <y 1 < 1 }.

This implies Y 1 ⊥ Y 2 ; Y 2 ∼ Γ(2, 1) and Y 1 ∼ U (0, 1).

6 Order Statistics

Now we come again to statistics^6 in this class; moving focus from the random processes (variables) which generate data, to drawing inference on a supposed process.

The order statistics are ordered values of a random process. Here, the order statistics are not technically observed values: so we write them with big X’s instead of little x’s.

Let X 1 , ..., Xn be independent and identically distributed random variables — the X’s are ‘i.i.d’ we say. The ordered values, written X(1), ..., X(n), with:

X(1) ≡ min(X 1 , ..., Xn), X(n) ≡ max(X 1 , ..., Xn)

and X(k) the kth largest value in X 1 , ..., Xn. That is, for X(1), ...X(2n+1), X(n+1) is the median.

7 Distributions for Order Statistics

As a first example, let X 1 , X 2 ∼ fX 1 ,X 2 (x 1 , x 2 ) with X 1 ⊥ X 2.

(^6) A statistic is any function of observed data. In our language, in this class, observed data are samples x 1 , ..., xn from a random process X

8 Distributions of Order Statistics

8.1 Distribution of maximum and minimum

Starting with the pdf for the maximum X(n)

fX(n) (x) = P(1 of (X 1 , ...Xn) ∈ x ± , all else < x)

= fX · [

∫ (^) x

fX (t)]n−^1 · n

= n · fX [FX ]n−^1

The pdf for the minimum X(1) can be found similarly:

fX(1) = P(1 of (X 1 , ...Xn)in x ± , all else > x)

= fX · [

x

fX (t)]n−^1 · n

n · fX [1 − FX ]n−^1

8.2 Full Joint distribution

The joint distribution for the entire order statistics can be ‘derived’ heuristically from what we know about pdf’s of transformations.

Let X 1 , ...Xn ∼ fX , i.i.d. Then set Yk = X(k), the kth order statistic. Then Yk is a trans- formation: Yk = g(X 1 , ...Xn). It is easy to see that the determinant of the Jacobian — the matrix of partial derivatives of the form (( ∂x ∂xji ))i,j=1..n — will be 1. For n = 3, for instance, say y 1 = x∗ 1 , y 2 = x∗ 2 , y 3 = x∗ 3 , one possible outcome. The Jacobian is 1 on the diagonal and 0 elsewhere, yielding a determinant of 1. But there are 3! possible arrangements, so the pdf is fX(1),X(2),X(3) (x(1), x(2), x(3)) = 3! · fX (x(1)) · fX (x(2)) · fX (x(3)). In general the result is

fX(1),...,X(n) = n! · fX (x(1)) · · · fX x(n)

With the full joint pdf you can compute marginal densities for any one or several of the X(1), ..., X(n), by integrating out the remaining densities. The pdf for any X(j):

fX(j) = n! (n − j)!(j − 1)! · [FX (x(j)]j−^1 · [1 − FX (x(j))]n−j^ · fX (x(j))

The pdf for any X(i), X(j) with i < j is:

fX(i),X(j) = n! (i − 1)!(j − i − 1)!(n − j)! ·[FX (x(i))]i−^1 ·[FX (x(j)) − FX (x(i))]j−i−^1 ·[1 − FX (x(j))]n−j ·fX (x(i)) · fX (x(j))

There is an attractive heuristic for these pdfs. The probability fX(i),X(j) is the multinomial probability: [FX (x(i))]i−^1 is the probability that i − 1 values are less than the ith order statistic x(i); [FX (x(j)) − FX (x(i))]j−i−^1 is the probability of the values between the ith and jth order statistic; [1 − FX (x(j))]n−j^ is the probability of the values greater than the jth order statistic.

8.3 Example - distribution of median

We have been calling ˜x the median, i.e. the value such that FX x˜ = 12. For X 1 , ..., X 2 n+ random variable, the median is X(n+1). The distribution is

f (^) X˜ = fX(n+1) = (2n + 1)! (n + 1 − 1)!(2n + 1 − (n + 1))! ·[FX (x(n+1))]n+1−^1 ·[1−FX (x(n+1))]^2 n+1−(n+1)·fX (x(n+1))

9 Exchangeable Random Variables

Random variables are called exchangeable if the probability distribution is invariant to per- mutations, i.e. if

P(Xi 1 < x 1 , ..., Xin < xn) = P(X 1 < x 1 , ...Xn < xn)

for any permutation i 1 , ...in.

The uniqueness of the MGF (in this class^7 ) yields that fX (x) = 10 x

11.2 Example

Say X ∼ Ber(p). Then

E(etX^ ) = p · et^ + (1 − p) · e^0 = pet^ + (1 − p)

11.3 Example

Say X ∼ Bin(n, p). Then

E(etX^ ) =

etK^ Ckn pk(1 − p)n−k

=

Ckn (pet)k(1 − p)n−k = (pet^ + (1 − p))n

11.4 Example

Say X ∼ N (0, 1). Then

E[etX^ ] =

2 π

R

etxe

x 22 dx

2 π

e−^

(x−t)^2 2 +^ t

2 (^2) dx

= e t 22 ·

2 π

e−^

(x− 2 t )^2 dx

= e t 22 (^7) I’m going to stop saying that now. The reason that the MGF does not always uniquely determine the pdf relates to the condition on the integrability of the E[etX^ ], the existence of an h > 0. A generalization of the MGF, the so-called characteristic function E(eitX^ ), is complete for the pdf.

11.5 Example

Say X ∼ Exp(λ). Then

E[etX^ ] =

R+

etxλe−λxdx

= λ

R+

e−(λ−t)xdx

λ λ − t

1 − βt

with β = (^) λ^1

11.6 Moments

The eponymous result is that we can obtain moments directly from MX. Take the first derivative of MX (t)

d dt MX (t) = M ′ X (t) =

∑xe txfX^ , xetxpX ,

This implies that M ′ (t)|t=0 = E(X) = μ. Take another derivative

d^2 dt^2

MX (t) = M (^) X′′ (t) =

{ ∫^

∑x 2 etxfX^ , x^2 etxpX ,

which implies that M ′′ (t)|t=0 = E(X^2 ). Thus σ^2 = M ′′ (0) − [M ′ (0)]^2.

In general: M (m)(t)|t=0 = E(Xm); the mth derivative of the MGF is the mth moment of X.

Take Note!:

etX^ = 1 + tX +

(tX)^2 + · · · +

k! (tX)k^ + · · ·

from the Taylor expansion of etX^ about 0. Taking an expected value yields

E(etX^ ) = E[1 + tX +

(tX)^2 + · · · 1 k!(tX)k^ + · · · ]

12.3 MGF of Sums of Independent Random Variables

If X ∼ MX and Y ∼ MY with X ⊥ Y , then...

Mc 1 X+c 2 Y = MX (c 1 tX ) · MY (c 2 tY )

and in general for random variables X 1 , ...Xn and constants c 1 , ..., cn

MPni=1 ciXi (t) =

∏^ n

i

MXi (citXi )

If X 1 , ...Xn ∼ dFX , then just write

MPni=1 ciXi (t) =

∏^ n

i

MXi (citX )

or

MPni=1 ciXi (t) =

∏^ n

i

MXi (cit)

12.3.1 Example

Say Y =

∑α i=1 Xi^ with^ Xi^ ∼^ Exp(λ^ =^ 1 β ), then

MY =

∏^ α

i=

MX

= [

1 − βt ]α

This is the MGF for a Γ(α, β) random variable.

13 Joint Dist of X and S^2

Let X ∼ μ, σ^2 — with some unspecified distribution. Let X = (^) n^1

∑n i=1 Xi^ and^ S

1 n− 1

∑n i=1(Xi^ −^ X)^2.

We know from an earlier work that E[X] = μ and V ar(X) = σ n^2. By the Central Limit Theorem we know that limn Xn ∼ N (μ, σ 2 n )

But what about the distribution of S^2? It turns out that we can write

∑ (Xi − μ)^2 =

(Xi − X + X − μ)^2

=

(Xi − X)^2 +

(X − μ)^2

since 2(X − μ)

(Xi − X) = 0.

Write W =

(Xiσ− μ)^2 and W 1 = (X−μ)

2 σ^2 /n and^ W^2 =^

P(Xi−X) 2 σ^2.^ Then^ W^ =^ W^1 +^ W^2 and W 1 ⊥ W 2.

So E[etW^ ] = E[etW^1 ]E[etW^2 ] which is just MW = MW 1 · Mw 2 , by independence.

But we know that W ∼ χ^2 n because it is the distribution of the sum of squared deviations. We also know that W 1 ∼ χ^21 , since X ∼ N (μ, σ^2 /n) and we standardize it to get W 1. That implies that W 2 ∼ χ^2 n− 1 , as it is independent, and its MGF must be

MW 2 =

MW

MW 1

( (^1) −^12 t )n/^2 ( (^1) −^12 t )^1 /^2

= (

1 − 2 t )(n−1)/^2

which is χ^2 n− 1. Just one last bit of algebra:

W 2 =

(Xi − X)^2 /(n − 1) σ^2 /(n − 1)

= (n − 1)

S^2

σ^2

So the distribution of (n − 1)S σ^22 is χ^2 n− 1.