Lecture Notes on Random Variable - Basic Statistical Method | ISYE 2028, Study notes of Data Analysis & Statistical Methods

Material Type: Notes; Professor: Abayomi; Class: Basic Statistical Meth; Subject: Industrial & Systems Engr; University: Georgia Institute of Technology-Main Campus; Term: Spring 2009;

Typology: Study notes

Pre 2010

Uploaded on 08/04/2009

koofers-user-pn4
koofers-user-pn4 🇺🇸

10 documents

1 / 13

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
ISYE 2028 A and B
Lecture 4
Dr. Kobi Abayomi
January 20, 2009
1 Introduction - Continuous Random Variables
We call a random variable continuous if it has an uncountable number of values; if it can
take all values in an interval of values.
Examples of continuous random variables: Survival time of drinkers of Smoke-Colar; Time
to recidivism for parolee of Savings and Loan Scandal; Amount of weight lost. Etc. Etc.
That the definition of continuous closely matches the version we use in single variable calculus
is natural and should make us feel good.
We can extend what we’ve said already about discrete random variables, using P’s, to say
analogous things about continuous random variables, using R’s. Remember that the integral,
R, is just the limit of P, as the discrete index goes to be an infinitesimal.1.
2 Probability Distribution of a Random Variable
Let’s extend the definition of the probability distribution to the continuous case by first
restating that the distribution is the complete specification of values of the random variable
with assigned probabilities. In the discrete case we could use this heuristic to write down
a function or a table. In the continuous case, the distribution of the random variable is
explicitly functional.
1In Leibniz’s view of the calculus. Now would be a good time to break out your Calc I textbook, if you
need to
1
pf3
pf4
pf5
pf8
pf9
pfa
pfd

Partial preview of the text

Download Lecture Notes on Random Variable - Basic Statistical Method | ISYE 2028 and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity!

ISYE 2028 A and B

Lecture 4

Dr. Kobi Abayomi

January 20, 2009

1 Introduction - Continuous Random Variables

We call a random variable continuous if it has an uncountable number of values; if it can take all values in an interval of values.

Examples of continuous random variables: Survival time of drinkers of Smoke-Colar; Time to recidivism for parolee of Savings and Loan Scandal; Amount of weight lost. Etc. Etc.

That the definition of continuous closely matches the version we use in single variable calculus is natural and should make us feel good.

We can extend what we’ve said already about discrete random variables, using

’s, to say analogous things about continuous random variables, using

∫ ’s. Remember that the integral, , is just the limit of

, as the discrete index goes to be an infinitesimal.^1.

2 Probability Distribution of a Random Variable

Let’s extend the definition of the probability distribution to the continuous case by first restating that the distribution is the complete specification of values of the random variable with assigned probabilities. In the discrete case we could use this heuristic to write down a function or a table. In the continuous case, the distribution of the random variable is explicitly functional.

(^1) In Leibniz’s view of the calculus. Now would be a good time to break out your Calc I textbook, if you

need to

Here is an explicit definition of a continuous probability distribution or probability density function (pdf):

For X a continuous random variable, the pdf of X is the function f (X) such that:

P(a ≤ X ≤ b) =

∫ (^) b

a

f (x)dx (1)

we call, f (x), the density curve for X. We can also restate some of our probability rules using this new definition.

For X a continuous random variable, on the real line, with density function f (x)

∫ (^) x −∞ f^ (u)du^ =^ F^ (x).^ F^ (x) is called the distribution function for^ X.^ F^ (x) =^ P(X^ ≤^ x), or the probability that and random variable X is less than or equal to x.

−∞ f^ (x)dx^ =^ F^ (+∞)^ −^ F^ (−∞) = 1^ −^ 0 = 1.^ Pay attention to nuance here: The distribution function of X is 1 at infinity. Every value of X is less than or equal to infinity. There is an analogous argument for F (−∞) = 0. And I point out that the area under the density curve must equal 1.

  • For all X ∈ ] − ∞, +∞[, 0 ≤ f (x) ≤ 1

2.1 Example

Say we have an interval A = {x : 0 ≤ x ≤ 2 } where we observe a real valued random variable, X. Say we believe the distribution function is of some form F (x) = cx^2 , with c a constant. We can immediately determine c: since F (2) = 1 = 4c → c = 1/4.

As well, F (x) = x

2 4 =^

0 f^ (u)du^ →^ f^ (x) =^

x

Then the probabilities for any interval, for example P(^14 ≤ X ≤ 12 ) =

1 / 4

u 2 du^ =^ F^ (1/2)^ − F (1/4) = 3/64.

2.2 Features

The property of the complement yields:

F (x) ≡ P(X > x) = 1 − P(X ≤ x) = 1 − F (x) (2)

E(X) =

0

x^2 2

dx = 8/ 6

In general:

E(h(x)) =

R

h(x)f (x)dx (7)

for any function, h.

Example:

Say X ∼ f (x) = x 2

Then

E(2X) =

0

x^2 2

dx = 2 · 8 / 6

Additionally, this equation — known as the layered representation — holds for non-negative random variables.

E(X) =

R+

(1 − F (x))dx (8)

Example:

Say X ∼ f (x) = x 2

Then

E(X) =

0

(1 − F (x))dx =

0

x^2 4

3.2 Population Variance

σ^2 = E[(X − μ)^2 ] =

(x − μ)^2 f (x)dx = V ar(X) (9)

Again, this can be reduced to:

σ^2 = V ar(X) = E(X^2 ) − μ^2 (10)

Example:

Say X ∼ f (x) = x 2

Then

V ar(X) =

0

(x − 8 /6)^2 ·

x 2

dx =

= 8/ 9

4 General Joint Distributions

Two given random variables X and Y have a general, joint distribution that is an extension of the single variable definition. In the discrete case

P((X, Y ) = (x, y)) = p(x, y) (11)

the joint probability mass function. In the continuous case

P((X, Y ) = (x ± , y ± )) = f (x, y) (12)

We generate the marginal distributions for X and Y alone just as we did for contingency tables by summing over all values of the other variable.

px(x) =

y

p(x, y); py(y) =

x

p(x, y) (13)

fx(x) =

R

f (x, v)dv; fy(y) =

R

f (u, y)dx (14)

Two random variables are independent if

pX,Y (x, y) = pX (x)pY (y) (15)

Again same as the discrete case.

6 Expectation and Covariance

The main idea is that expectation is a linear operator and that the expectations of a function is the expectation taken over the values of the function.

In a natural extension to two dimensions:

E(g(X, Y )) =

y

∫ ∫ x^ g(x, y)p(x, y),^ x, y discrete g(x, y)f (x, y)dxdy, x, y cont.

6.1 Example

We get the moments we use in calculation of mean and variance, etc. by choosing the function we take an expectation of.

g 1 (x) = x −→ E(g 1 (X)) = μX g 2 (x, y) = xy −→ E(g 2 (X, Y )) = E(XY ) g 3 (y) = (y − μy)^2 −→ E(g 3 (Y )) = E((Y − μy)^2 ) = V ar(Y )

7 Covariance

Let g(X, Y ) = [X − μX ][Y − μy]. Then:

E(g(X, Y )) = E([X − μX ][Y − μy]) = E(XY − μY X − μX Y + μX μY ) = E(XY ) − μY E(x) − μX E(Y ) + μX μY E(XY ) − μX μY

This expectation has a special name, the covariance of X, Y. So the covariance of X, Y is

Cov(X, Y ) = E([X − μX ][Y − μY ]) = E(XY ) − μX μY (22)

7.1 Properties of Covariance

7.1.1 Covariance can be negative

Cov(X, Y ) ∈ R

Note that V ar(X) ≥ 0.

7.1.2 Independence implies zero Covariance

If X ⊥ Y then

E(XY ) − μX μy = μX μY − μX μY = 0

but!

7.1.3 Zero Covariance does not imply independence

The fact here is Cov(X, Y ) ; X ⊥ Y.

For an example, take X, Y with this distribution :

P(X = 0) = P(X = 1) = P(X = −1) =

Y =

0 , X 6 = 0

1 , X = 0

Thus E(X) = 0 and E(XY ) = 0 but Y is obviously a function of X

In general, for many Y = g(X), where g is symmetric (about zero, for instance), Cov(X, Y ) = 0 but X is — of course — not independent of Y = g(X).

7.1.4 Covariance is symmetric

Cov(X, Y ) = Cov(Y, X)

8 Correlation Coefficient

The number ρ, which we introduced as a parameter to the multivariate normal distribution, is called the correlation coefficient

ρ =

Cov(X, Y ) √ σ^2 X σ Y^2

E([X − μx][Y − μY ]) √ E([X − μX ]^2 )E([Y − μY ]^2 )

Fact (a version of the Cauchy-Schwarz inequality):

|E([X − μx][Y − μY ])| ≤

E([X − μX ]^2 )E([Y − μY ]^2 )

so ρ ∈ [− 1 , 1]

Notice:

E(XY ) = μX μY + ρσX σY

since Cov(X, Y ) = ρσX σY

8.1 Properties of ρ

Let X, Y ∼ fX,Y with the conditional distribution of Y |X = x: fY |X = fX,Y fX.

Then

E(Y |X = x) =

yfY |X dy

yfx,ydy fx

Remember that the expected value of Y given X = x is a random variable depending upon the observed value of X, x. Say, this expected value is a linear function: set E(Y |X = x) = a + bx Call this equation (++) ≡ E(Y |X = x) = a + bx.

Let’s derive a general result for the conditional expectation when it is constrained to be a linear function, i.e. let’s solve for constants a and b

If we integrate both sides of this equation with respect to dx, we get:

μY = a + bμX

Now, integrate x · (++) (both sides) with respect to dx, this yields:

E(XY ) = aμX + bE(X^2 )

Realizing that E(X^2 ) = σ X^2 + μ^2 x, with the two above equations (for the two unknowns, a and b), the result is:

E(Y |X = x) = μY + ρ

σY σX

(X − μX ) (24)

N.B.: This is the same as the conditional expectation for the bivariate normal distribution. This suggests a role for the normal distribution in linear conditional expectation. Notice that in equation (24) the expectation is simply μY if ρ = 0. For the bivariate normal distribution, this is equivalent to X ⊥ Y.

Moreover, if Y = aX + b then Cov(X, Y ) ⇒= aσ^2 X and

ρ =

aσ^2 X √ σ^2 X · a^2 σ^2 X

aσ X^2 |a|σ^2 X

= 1 · sgn(a)

8.2 Variance, Again!

This is important for a general equation for variance of linear transforms: aX + bY

V ar(aX + bY ) = E(aX + bY )^2 − [E(aX + bY )]^2 = E(a^2 X^2 + abXY + b^2 Y 2 ) − [aμX + bμY ]^2 = E(a^2 X^2 ) + E(b^2 Y 2 ) + 2abE(XY ) − a^2 μX 2 − 2 abμX μY − b^2 μY 2 = a^2 E(X^2 ) − a^2 μX + b^2 E(Y 2 ) − b^2 μY + 2ab(E(XY ) − μX μY ) →

V ar(aX + bY ) = a^2 V ar(X) + b^2 V ar(Y ) + 2abCov(XY ) (25)

That is, for any interval [to, tf ] in [a, b] the probability P(to ≤ X ≤ tf ) = c(b − a). c is our (usual) constant of proportionality. If we take the entire interval we can solve for c: 1 = P(a ≤ X ≤ b) = c(b − a) implies c = (^) b−^1 a , which yields the uniform pdf.

The cumulative distribution function is generated in the usual way

FX (x) =

∫ (^) x

a

b − a

dt =

x − a b − a

and in notation we call X ∼ U [a, b]. In words ”X is uniformly distributed between a and b”. The pdf is:

I leave it to you to convince yourself of E(X) and V ar(X) for the uniform distribution. That’s just figuring out these integrals...

E(X) =

∫ (^) b

a

x b − a

b + a 2

V ar(X) =

∫ (^) b

a

x − E(X) b − a

dx =

(b − a)^2 12

It is not too hard.

11 Exercises

  • Convince yourself of the expectation and variance for a uniformly distributed random variable on interval [a, b].
  • Verify Cov(ax + b, Y ) = aCov(X, Y )
  • Verify Cov(

i Xi,^

j Yj^ ) =^

i

j Cov(Xi, Yj^ )

  • Let X be a random variable having finite expectation μ and variance σ^2. Let g(·) be a twice differentiable function. Show that E[g(X)] ≈ g(μ) + g

′′(μ) 2 σ

(^2). Hint: Expand g(·) in a Taylor series about μ. Now use this to suggest an approach for V ar(g(X)).