Continuous Random Variables: Densities, Expected Values, and Variance, Lecture notes of Calculus

Continuous random variables, their probability density functions (pdf), and how to compute their expected values and variance. It includes definitions, propositions, and examples of uniform, exponential, and normal distributions. It also covers the relationship between the pdf and cumulative distribution function (cdf), and the concept of histograms.

Typology: Lecture notes

2021/2022

Uploaded on 08/01/2022

hal_s95
hal_s95 🇵🇭

4.4

(655)

10K documents

1 / 15

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
5 Continuous random variables
We deviate from the order in the book for this chapter, so the subsections in
this chapter do not correspond to those in the text.
5.1 Densities of continuous random variable
Recall that in general a random variable Xis a function from the sample
space to the real numbers. If the range of Xis finite or countable infinite,
we say Xis a discrete random variable. We now consider random variables
whose range is not countably infinite or finite. For example, the range of X
could be an interval, or the entire real line.
For discrete random variables the probability mass function is fX(x) =
P(X=x). If we want to compute the probability that Xlies in some set,
e.g., an interval [a, b], we sum the pmf:
P(aXb) = X
x:axb
fX(x)
A special case of this is
P(Xb) = X
x:xb
fX(x)
For continuous random variables, we will have integrals instead of sums.
Definition 1. A random variable Xis continuous if there is a non-negative
function fX(x), called the probability density function (pdf) or just density,
such that
P(Xt) = Zt
−∞
fX(x)dx
Proposition 1. If Xis a continuous random variable with density f(x),
then
1. P(X=x) = 0 for any xR.
2. P(aXb) = Rb
af(x)dx
3. R
−∞ f(x)dx = 1
1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Continuous Random Variables: Densities, Expected Values, and Variance and more Lecture notes Calculus in PDF only on Docsity!

5 Continuous random variables

We deviate from the order in the book for this chapter, so the subsections in this chapter do not correspond to those in the text.

5.1 Densities of continuous random variable

Recall that in general a random variable X is a function from the sample space to the real numbers. If the range of X is finite or countable infinite, we say X is a discrete random variable. We now consider random variables whose range is not countably infinite or finite. For example, the range of X could be an interval, or the entire real line. For discrete random variables the probability mass function is fX (x) = P(X = x). If we want to compute the probability that X lies in some set, e.g., an interval [a, b], we sum the pmf:

P(a ≤ X ≤ b) =

x:a≤x≤b

fX (x)

A special case of this is

P(X ≤ b) =

x:x≤b

fX (x)

For continuous random variables, we will have integrals instead of sums.

Definition 1. A random variable X is continuous if there is a non-negative function fX (x), called the probability density function (pdf ) or just density, such that

P(X ≤ t) =

∫ (^) t

−∞

fX (x) dx

Proposition 1. If X is a continuous random variable with density f (x), then

  1. P(X = x) = 0 for any x ∈ R.
  2. P(a ≤ X ≤ b) =

∫ (^) b a f^ (x)^ dx

−∞ f^ (x)^ dx^ = 1

Proof. First we observe that subtracting the two equations

P(X ≤ b) =

∫ (^) b

−∞

fX (x) dx, P(X ≤ a) =

∫ (^) a

−∞

fX (x) dx

gives

P(X ≤ b) − P(X ≤ a) =

∫ (^) b

a

fX (x) dx

and we have P(X ≤ b) − P(X ≤ a) = P(a < X ≤ b), so

P(a < X ≤ b) =

∫ (^) b

a

fX (x) dx (1)

Now for any n

P(X = x) ≤ P(x − 1 /n < X ≤ x) =

∫ (^) x

x− 1 /n

fX (t) dt

As n → ∞, the integral goes to zero, so P(X = x) = 0. Property 2 now follows from eq. (1) since

P(a ≤ X ≤ b) = P(a < X ≤ b) + P(X = a) = P(a < X ≤ b)

Note that since the probability X equals any single real number is zero, P(a ≤ X ≤ b), P(a < X ≤ b), P(a ≤ X < b), and P(a < X < b) are all the same. Property 3 is just the fact that P (−∞ < X < ∞) = 1.

Caution Often the range of X is not the entire real line. Outside of the range of X the density fX (x) is zero. So the definition of fx(x) will typically involves cases: in one region it is given by some formula, elsewhere it is simply

  1. So integrals over all of R which contain fX (x) will reduce to intervals over a subset of R. If you mistakenly integrate the formula over the entire real line you will get nonsense.

Corollary 1. If X is a continuous random variable with finite variance σ^2 and mean μ, then

σ^2 = E[X^2 ] − μ^2 =

−∞

x^2 fX (x) dx − μ^2

Proof. By the theorem

σ^2 = E[(X − μ)^2 ] =

(x − μ)^2 fX (x) dx =

[x^2 − 2 μx + μ)^2 ] fX (x) dx

x^2 fX (x) dx − 2 μ

x fX (x) dx + μ^2

fX (x) dx

x^2 fX (x) dx − 2 μ^2 + μ^2 =

x^2 fX (x) dx − μ^2

5.3 Catalog

As with discrete RV’s, two continuous RV’s defined on completely different probability spaces can have the same density.

Definition 4. Two continuous random variables are identically distributed if they have the same pdf.

There are certain densities that come up a lot. So we start a catalog of them. Note that the mean and variance of the RV only depend on its pdf.

Uniform: (two parameters a, b ∈ R with a < b) The uniform density on [a, b] is

f (x) =

b−a ,^ if^ a^ ≤^ x^ ≤^ b 0 , otherwise

We have seen the uniform distribution before. Previously we said that to compute the probability X is in some subinterval [c, d] of [a, b] you take the length of that subinterval divided by the length of [a, b]. This is of course what you get when you compute

∫ (^) d

c

fX (x) dx =

∫ (^) d

c

b − a

dx =

d − c b − a

Next we find the mean and variance of the uniform distribution on [a, b]. The mean is

μ =

∫ (^) b

a

x f (x) dx =

∫ (^) b

a

x b − a

dx =

b^2 − a^2 b − a

a + b 2

For the variance we have to first compute

E[X^2 ] =

∫ (^) b

a

x^2 f (x) dx (3)

We then subtract the square of the mean and find σ^2 = (b − a)^2 /12.

Exponential: (one real parameter λ > 0 )

f (x) =

λe−λx, if x ≥ 0 0 , if x < 0

Check that its total integral is 1. Note that the range is [0, ∞). One of the homework problems is to compute its mean and variance.

Normal: (two real parameters σ > 0, μ ∈ R )

f (x) =

σ

2 π

exp

x − μ σ

The range of a normal RV is the entire real line. It is anything but obvious that the integral of this function is 1. Try to show it.

End of lecture - Fri, Oct 6

Cauchy:

f (x) =

π(1 + x^2 )

Example (skipped): Suppose X has the Cauchy distribution. Find the number c with the property that P(X ≥ c) = 1/4.

x 2 3 4 5 6 fX (x) 1/8 1/8 3/8 2/8 1/

Example: Let X be a discrete RV whose pmf is given in the table.

GRAPH !!!!!!!!!!!!!!!!!!!!

Example: Compute cdf of exponential distribution.

Theorem 2. Let X be a continuous RV with pdf f (x) and cdf F (x). Then they are related by

F (x) =

∫ (^) x

−∞

f (t) dt,

f (x) = F ′(x)

Proof. The first equation is immediate from the def of the cdf. To get the second equation, differentiate the first equation and remember that the fun- damental theorem of calculus says

d dx

∫ (^) x

a

f (t) dt = f ′(x)

Theorem 3. For any random variable the cdf satisfies

  1. F (x) is non-decreasing, 0 ≤ F (x) ≤ 1.
  2. limx→−∞ F (x) = 0, limx→∞ F (x) = 1.
  3. For a continuous random variable the cdf is continuous.
  4. For a discrete random variable the cdf is piecewise constant. The set of points where it jumps is the range of X. If x is a point where it has a jump, then the height of the jump is P(X = x).

Proof. 1 is obvious .... To prove 2, let xn → ∞. Assume that xn is increasing. Let En = {X ≤ xn}. Then En is an increasing sequence of events. By the continuity of the probability measure,

P(∪∞ n=1En) = lim n→∞

P(En)

Since xn → ∞, every outcome is in En for large enough n. So ∪∞ n=1En = Ω. So

lim n→∞ F (xn) = lim n→∞ P(En) = 1 (7)

The proof that the limit as x → −∞ is 0 is similar. GAP

We will not need the following theorem, but a natural question is whether all functions F with the properties in the previous theorem are the cdf of some random variable.

Theorem 4. Let F (x) be a function from R to [0, 1] such that

  1. F (x) is non-decreasing.
  2. limx→−∞ F (x) = 0, limx→∞ F (x) = 1.
  3. F (x) is continuous from the right.

Then F (x) is the cdf of some random variable, i.e., there is a probability space (Ω, F, P) and a random variable X on it such that F (x) = P(X ≤ x).

The proof of this theorem is way beyond the scope of this course. In fact, the resulting random variable need not be a discrete or continuous random variable as we have defined them.

5.5 Function of a random variable

Let X be a continuous random variable and g : R → R. Then Y = g(X) is a new random variable. We want to find its density. This is not as easy as in the discrete case. In particular fY (y) is not

x:g(x)=y fX^ (x).

Proof.

P(Y ≤ y) = P(F −^1 (U ) ≤ y) = P(U ≤ F (y)) = F (y) (11)

Application:My computer has a routine to generate random numbers that are uniformly distributed on [0, 1]. We want to write a routine to generate numbers that have an exponential distribution with parameter λ.

How do you simulate normal RV’s? Not so easy since the cdf cannot be explicitly computed. More on this later. When Y = g(X) and we know the pdf of X, then we have seen how to compute the pdf of Y. If g is increasing on the range of X or decreasing on the range of X, then there is a formula.

Theorem 5. Let g be strictly increasing or strictly decreasing on the range of X. Assume also that g is differentiable. Then

fY (y) = fX (g−^1 (y))

d dy

g−^1 (y)

where g−^1 is the inverse function of g, i.e., the function such that g−^1 (g(y)) = y.

Note that g−^1 is not 1/g. A formula from calculus relates the derivative of the inverse function to the derivative of the original function. It says

d dy

g−^1 (y) =

g′(g−^1 (y))

Example X is exponential with λ = 1. Y = exp(−X). So g(y) = exp(−y) and g−^1 (y) = − ln(y). GAP

Proof of theorem GAP

5.6 Histograms and the meaning of the pdf

For a discrete RV the pmf f (x) has a direct interpretation. It is the proba- bility that X = x. For a continuous RV, the pdf f (x) is not the probability

that X = x (which is zero), nor is it the probability of anything. If δ > 0 is small, then ∫ (^) x+δ

x−δ

f (u) du ≈ 2 δf (x)

This is P (x − δ ≤ X ≤ x + δ). So the probability X is in the small interval [x − δ, x + δ] is f (x) times the length of the interval. So f (x) is a probability density. Histogram are closely related to the pdf and can be thought of as “ex- perimental pdf’s.” Suppose we generate N independent random samples of X where N is large. We divide the range of X into intervals of width ∆x (usually called “bins”). The probability X lands in a particular bin is P (x ≤ X ≤ x + ∆x) ≈ f (x)∆x. So we expect approximately N f (x)∆x of our N samples to fall in this bin. To construct a histrogram of our N samples we first count how many fall in each bin. We can represent this graphically by drawing a rectangle for each bin whose base is the bin and whose height is the number of samples in the bin. This is usually called a frequency plot. To make it look like our pdf we should rescale the heights so that the area of a rectangle is equal to the fraction of the samples in that bin. So the height of a rectangle should be

number of samples in bin N ∆x

With these heights the rectagles give the histogram. As we observed above, the number of our N samples in the bin will be approximately N f (x)∆x, so the above is approximately f (x). So if N is large and ∆x is small, the histogram will approximate the pdf.

5.7 More on expected value

The material in this section will not be “on the test.” For a continuous RV we defined the expected value E[X] to be

x fX (x) dx. This is not really how it should be defined. There is a way to define E[X] for any random variable and then prove that it in the case of a continuous RV, it is given by

x fX (x) dx. A proper discussion of how to define E[X] for any RV requires the theory of abstract Lebesgue integration which is way beyond the level of this course. Nonetheless, we can still give a non-rigorous explanation of how to define E[X].

When n is large, the integrals in the sum are over a very small interval. In this interval, x is very close to k/n. In fact, they differ by at most 1/n. So the limit as n → ∞ of the above should be M n∑− 1

k=−M n

∫ k+1 n

k n

x fX (x) dx =

∫ M

−M

x fX (x) dx =

−∞

x fX (x) dx

The last equality comes from the fact that fX (x) is zero outside [−M, M ]. The above is not a proof, but it should make the following plausible:

Theorem 6. Let X be a continuous RV. If we define E[X] to be limn→∞ E[Xn], then

E[X] =

−∞

x fX (x) dx

We can now use these ideas to give a non-rigorous derivation of a theorem we stated before:

Theorem 7. Let X be a continuous RV, g a function from R to R. Let Y = g(X). Then

E[Y ] = E[g(X)] =

−∞

g(x) fX (x) dx

Proof. Since we do not know how to find the density of Y , we cannot prove this yet. We just give a non-rigorous derivation. Let Xn be the sequence of discrete RV’s that approximated X defined above. Then g(Xn) are discrete RV’s. They approximate g(X). In fact, if the range of X is bounded and g is continous, then g(Xn) will converge uniformly to g(X). So E[g(Xn)] should converges to E[g(X)]. Now g(Xn) is a discrete RV, and by the law of the unconscious statistician

E[g(Xn)] =

x

g(x) fXn (x) (13)

Looking back at our previous derivation we see this is

E[g(Xn)] =

M n∑− 1

k=−M n

g(

k n

∫ k+1 n

nk

fX (x) dx

M n∑− 1

k=−M n

∫ k+1 n

k n

g(

k n

) fX (x) dx

which converges to ∫ g(x) fX (x) dx (14)

Recall that for a discrete random variable that only takes on values in 0 , 1 , 2 , · · ·, we showed in a homework problem that

E[X] =

∑^ ∞

k=

P (X > k) (15)

There is a similar result for non-negative continuous random variables.

Theorem 8. Let X be a non-negative continuous random variable with cdf F (x). Then

E[X] =

0

[1 − F (x)] dx (16)

provided the integral converges.

Proof. We use integration by parts on the integral. Let u(x) = 1 − F (x) and dv = dx. So du = −f dx and v = x. So ∫ (^) ∞

0

[1 − F (x)] dx = x(1 − F (x))|∞ x=0 +

0

x f (x) dx = E[X] (17)

Note that the boundary term at ∞ is zero since F (x) → 1 as x → ∞.

We can use the above to prove the law of the unconscious statistician for a special case. We assume that X ≥ 0 and that the function g is from [0, ∞) into [0, ∞), is strictly increasing, and g(0) = 0. Note that this implies that g has an inverse. Then

E[Y ] =

0

[1 − FY (x)] dx =

0

[1 − P(Y ≤ x)] dx (18)

0

[1 − P(g(X) ≤ x)] dx =

0

[1 − P(X ≤ g−^1 (x))] dx (19)

0

[1 − FX (g−^1 (x))] dx (20)