Joint Distributions: Prob. Mass & Density, Marginal & Conditional, Examples - Prof. Kobi A, Study notes of Data Analysis & Statistical Methods

The concepts of joint distributions, including joint probability mass and density functions, marginal distributions, and conditional distributions. It provides examples of calculating joint probabilities, marginal distributions, and conditional distributions for discrete and continuous random variables. The document also discusses the relationship between the joint distribution and the marginal distributions.

Typology: Study notes

Pre 2010

Uploaded on 08/04/2009

koofers-user-x96
koofers-user-x96 🇺🇸

10 documents

1 / 10

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
ISYE 2028 A and B
Lecture 5
Dr. Kobi Abayomi
January 29, 2009
1 Joint Distributions
Two given random variables Xand Yhave a general distribution a joint distribution
that is an extension of the single variable definition and notation we generate from first
principles
FX,Y (x, y) = P(Xx, Y y) (1)
Taking derivatives, dFX,Y (x, y), yields..
...In the discrete case:
P((X, Y ) = (x, y )) = p(x, y) (2)
the joint probability mass function.
...In the continuous case
P((X, Y ) = (x±, y ±)) = f(x, y) (3)
1.1 Marginal Distributions
We generate the marginal distributions for Xand Yalone just as we did for contingency
tables by summing over all values of the other variable.
1
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Joint Distributions: Prob. Mass & Density, Marginal & Conditional, Examples - Prof. Kobi A and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity!

ISYE 2028 A and B

Lecture 5

Dr. Kobi Abayomi

January 29, 2009

1 Joint Distributions

Two given random variables X and Y have a general distribution — a joint distribution — that is an extension of the single variable definition and notation we generate from first principles

FX,Y (x, y) = P(X ≤ x, Y ≤ y) (1)

Taking derivatives, dFX,Y (x, y), yields..

...In the discrete case:

P((X, Y ) = (x, y)) = p(x, y) (2)

the joint probability mass function.

...In the continuous case

P((X, Y ) = (x ± , y ± )) = f (x, y) (3)

1.1 Marginal Distributions

We generate the marginal distributions for X and Y alone just as we did for contingency tables by summing over all values of the other variable.

px(x) =

y

p(x, y); py(y) =

x

p(x, y) (4)

fx(x) =

R

f (x, v)dv; fy(y) =

R

f (u, y)dx (5)

If you will recall our contingency table example(s), where we generated a marginal distribu- tion by summing over columns of the tale to yield the distributions of the margins.

FX (x) = P(X ≤ x) = P(X ≤ x, Y ≤ ∞) = P(lim y↑∞

{X ≤ x, Y ≤ y})

= lim y↑∞

P({X ≤ x, Y ≤ y})

= lim y↑∞ FX,Y (x, y)

The joint survival distribution can be generated from first principles as well:

P(X > x, Y > y) = = 1 − P({X > x, Y > y}c) = 1 − P({X > x}c^ ∪ {Y > y}c)

...by deMorgan’s laws...

= 1 − P({X ≤ x} ∪ {Y ≤ y})

...by the inclusion-exclusion principle...

1 − [P(X ≤ x) + P(Y ≤ y) − P(X ≤ x, Y ≤ y)]

...by changing notation...

= 1 − FX (x) − FY (y) + FX,Y (x, y)

You can use the facts above to verify:

1 / 3

0

fX,Y (x, y)dxdy

1 / 3

0

6 x^2 ydxdy +

1

0

0 dxdx

= 3/8 + 0 = 3/ 8

...which is the volume under the curve 6x^2 y.

2.3 Example

Let fX,Y (x, y) = c (^1) {x (^2) +y (^2) ≤R (^2) }; c and R some constants. What is c? What are the marginal distributions? What is the distribution of the distance of any point (X, Y )

First:

c

x^2 +y^2 ≤R^2

dxdy = 1

→ c = (

x^2 +y^2 ≤R^2

dxdy)−^1 =

πR^2

Second:

fX (x) =

fX,Y (x, y)dy =

πR^2

x^2 +y^2 ≤R^2

dy

Substitute y =

R^2 − x^2

πR^2

∫ (^) y

−y

dt

πR^2

2 y =

πR^2

R^2 − x^2

Third:

Let D =

x^2 + y^2. Then, from first principles,

FD(d) = P(

X^2 + Y 2 < d)

= P(X^2 + Y 2 ≤ d^2 )

x^2 +y^2 ≤d^2

πR^2

dxdy

πR^2

x^2 +y^2 ≤d^2

dxdy

πd^2 πR^2

2.4 Example - Multinomial Distribution

The multinomial distribution is an extension of the binomial distribution.

In this model, there are n identical experiments each with k possible outcomes, each outcome having probability pi,

∑k i=1 pi^ = 1.

From first principles, the probability mass function is:

P(X 1 = n 1 , ..., Xk = nk) = Cnn 1 ,...,nk pn 1 1 · · · pn k k (8)

Cnn 1 ,...,nk = (^) n 1 !n···!nk! is the multinomial coefficient, which you have seen before -

∑k i=1 nk^ =^ n

Example:

Roll an unfair die - let the probability of rolling a one be 1/2, the probability of rolling either a two or three be equal and twice that of rolling a four or five, the probability of rolling a six 1 /4. What is the probability of rolling: 3 ones, 2 twos, 2 threes, 1 four, 1 five, and no 6’s?

)^3 (

)^2 (

)^2 (

)^1 (

)^1 (

)^0

3 The conditional distribution

Recall from first principles:

fX|Y (x|y) =

fX,Y (x, y) fY (y)

P(X ∈ x ± , Y ∈ y ± ) P(Y ∈ y ± )

= P(X ∈ x ± |Y ∈ y ± ) (9)

This is the conditional probability density function for X given Y = y. We generate the conditional cumulative distribution function in the usual way:

E(X 1 |X 2 = x 2 ) =

∫ (^) x 2

0

x 1 (

x 2

)dx 1 =

x 2 2

in this instance is a function of x 2. The conditional variance

V ar(X 1 |X 2 = x 2 ) =

∫ (^) x 2

0

(x 1 −

x 2 2

)^2 (

x 2

)dx 1 =

x^22 12

is also, in this instance, a function of x 2.

N.B. (nota bene): E(X 1 ) =

0 x(2^ −^2 x^1 )dx^1 = 2/3 but^ E(X^1 |X^2 =^ x^2 ) =^

x 2 2.^ The ex- pectation of X 1 is a constant, but the conditional expectation of X 1 given X 2 is a random variable.

Are X 1 and X 2 independent? Heuristically, by just looking at the pdf — with the indicator — we could conclude no.

Does P(0 < X 1 < 12 |X 2 = 34 ) =?^ P(0 < X 1 < 12 )

Well...: On the one hand — P(0 < X 1 < 12 |X 2 = 34 ) =

0 fX^1 |X^2 =3/^4 (x^1 |x^2 =^

3 ∫ (^1) / 2 4 )dx^1 = 0

4 3 dx^1 =^

2 3 , but on the other —^ P(0^ < X^1 <^

1 2 ) =^

0 fX^1 (x^1 )dx^1 =^

0 2(1−x^1 )dx^1 =^

3

5 Example

Take two random variable X 1 , X 2 ∼ fX 1 ,X 2 = 6x 2 · (^1) { 0 <x 2 <x 1 < 1 }

The marginal pdf for X 1 is...

fX 1 (x 1 ) =

∫ (^) x 1

0

6 x 2 dx 2 = 3x^21 · (^1) { 0 <x 1 < 1 }

...the conditional pdf for X 2 |X 1 = x 1 is...

fX 2 |X 1 =x 1 (X 2 |X 1 = x 1 ) =

fX 1 ,X 2 fX 1

6 x 2 3 x^21

2 x 2 x^21

· (^1) { 0 <x 2 <x 1 < 1 }

and the conditional expectation is...

E(X 2 |X 1 = x 1 ) =

∫ (^) x 1

0

x 2 (

2 x 2 x^21

)dx 2 =

x 1 · (^1) { 0 <x 1 < 1 }

...a random variable.

Let Y = E(X 2 |X 1 = x 1 ), then Y is a random variable, dependent upon the value of X 1 , where 0 < y < 2 /3 — since 0 < x 1 < 1.

The cdf for Y is

FY (y) = P(Y ≤ y) = P(

X 1 ≤ y) = P(X 1 ≤

3 Y

which can be computed using the pdf for X 1 as...

∫ (^3) y/ 2

0

3 x^21 dx 1 =

27 y^3 8

The pdf for Y is...

dFY (y) = fY (y) =

81 y^2 8

the expectation for Y is...

E(Y ) =

0

y(

81 y^2 8

)dy =

and the variance is...

V ar(Y ) =

0

y^2 (

81 y^2 8

)dy −

So Y = E(X 2 |X 1 = x 1 ) is a random variable with Y ∼ μY = 12 , σ^2 Y = 14.

N.B.: fX 2 (x 2 ) =

x 2 6 x^2 dx^1 = 6x^2 (1^ −^ x^2 )^ ·^1 {^0 <x^2 <^1 }^ which yields^ E(X^2 ) =^

1 2 and^ V ar(X^2 ) = 1

6 E(E(X 1 |X 2 )) = E(X 1 ) and V ar(E(X 1 |X 2 )) ≤ V ar(X 1 )

The big result, implied by the last example is

→ V ar(X 2 ) ≥ V ar(E(X 2 |X 1 ))

The upshot is that variance can be reduced by conditioning, though conditioning does not change the expectation. That makes sense.

These results may not be covered in order in your text — but you should know them now, nonetheless.

7 Exercises

  • Let f (x 1 , x 2 ) = 21x^21 x^32 · (^1) { 0 <x 1 <x 2 < 1 }. Find the conditional mean and variance of X 1. Find the distribution of Y = E(X 1 |X 2 ). Determine and compare the mean and variance of Y to the mean and variance of X 1.
  • Let X 1 , X 2 have the joint pdf f (x 1 , x 2 ) (x 1 , x 2 ) (0, 0) (0, 1) (1, 0) (1, 1) (2, 0) (2, 1) f (x 1 , x 2 ) 181 183 184 183 186 181 Find the marginal probability density functions and the two conditional means.
  • Let X 1 ≡ a number chosen at random on the unit interval (0, 1). Let X 2 ≡ a number on (0, x 1 ). Make some assumptions about the marginal pdf of fX 1 and compute the conditional mean E(X 1 |x 2 ).
  • Let f (x) and F (x) denote the pdf and cdf for a random variable X. The conditional pdf of X, given X > x 0 , x 0 some constant, is defined by f (x|X > x 0 ) = (^1) −fF^ (x ()x 0 ) · (^1) {x 0 <x}. This is known the hazard rate for X. Show that the hazard rate is a pdf. Let f (x) = e−x^ · 1 { 0 < x < ∞}. Compute f (x|X > x 0 ).