Chapter 4. Continuous Random Variables 4.1, Schemes and Mind Maps of Statistics

Definition 4.1.1: Continuous Random Variables ... Every continuous random variable has a probability density function (PDF), instead of a probability mass.

Typology: Schemes and Mind Maps

2021/2022

Uploaded on 08/01/2022

hal_s95
hal_s95 🇵🇭

4.4

(655)

10K documents

1 / 11

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Chapter 4. Continuous Random Variables
4.1: Continuous Random Variables Basics
Slides (Google Drive) Alex Tsun Video (YouTube)
Up to this point, we have only been talking about discrete random variables - ones that only take values
in a countable (finite or countably infinite) set like the integers or a subset. What if we wanted to model
quantities that were continuous - that could take on uncountably infinitely many values? If you haven’t
studied or seen cardinality (or types of infinities) before, you can think of this as being intervals of the real
line, which take decimal values. Our tools from the previous chapter were not suitable to modelling these
situations, and so we need a new type of random variable.
Definition 4.1.1: Continuous Random Variables
Acontinuous random variable is a random variable that takes values from an uncountably infinite
set, such as the set of real numbers or an interval. For e.g., height (5.6312435 feet, 6.1123 feet, etc.),
weight (121.33567 lbs, 153.4642 lbs, etc.) and time (2.5644 seconds, 9321.23403 seconds, etc.) are
continuous random variables that take on values in a continuum.
Why do we need continuous random variables?
Suppose we want a random number in the interval [0, 10], with each possibility being “equally likely”.
What is P(X= 3.141592) for such a random variable X? That is, if I chose a random decimal number
(with infinite precision/decimal places), what is the probability you guess it exactly right (matching
infinitely many decimal places)? The probability is actually 0, it’s not even a tiny positive number!
What is P(5 X8) for such a random variable X? That is, what if you were allowed to guess a
range instead of a single number? As you might expect, size of the required interval
size of the total interval =3
10 since the random
number is uniformly distributed.
Suppose we want to study the set of possible heights (in feet) a person can have, supposing that the range
of possible heights is the interval [1,8].
What is the probability that someone has a height of 5.2311333 feet? This is again 0, since you have
to be exactly precise!
What is the probability that someone has a height between 5 and 6 feet? This is non-zero, since we
are studying an interval. It isn’t necessarily 65
81=1
7though since heights aren’t necessarily uniformly
distributed! More people will have heights in the interval [4,6] feet than say [1,3] feet.
Notice, that since these values can have infinite precision, the probability that a variable has a specific value
is 0, in contrast to discrete random variables.
1
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Chapter 4. Continuous Random Variables 4.1 and more Schemes and Mind Maps Statistics in PDF only on Docsity!

Chapter 4. Continuous Random Variables

4.1: Continuous Random Variables Basics

Slides (Google Drive) Alex Tsun Video (YouTube)

Up to this point, we have only been talking about discrete random variables - ones that only take values in a countable (finite or countably infinite) set like the integers or a subset. What if we wanted to model quantities that were continuous - that could take on uncountably infinitely many values? If you haven’t studied or seen cardinality (or types of infinities) before, you can think of this as being intervals of the real line, which take decimal values. Our tools from the previous chapter were not suitable to modelling these situations, and so we need a new type of random variable.

Definition 4.1.1: Continuous Random Variables

A continuous random variable is a random variable that takes values from an uncountably infinite set, such as the set of real numbers or an interval. For e.g., height (5.6312435 feet, 6.1123 feet, etc.), weight (121.33567 lbs, 153.4642 lbs, etc.) and time (2.5644 seconds, 9321.23403 seconds, etc.) are continuous random variables that take on values in a continuum.

Why do we need continuous random variables?

Suppose we want a random number in the interval [0, 10], with each possibility being “equally likely”.

  • What is P (X = 3.141592) for such a random variable X? That is, if I chose a random decimal number (with infinite precision/decimal places), what is the probability you guess it exactly right (matching infinitely many decimal places)? The probability is actually 0, it’s not even a tiny positive number!
  • What is P (5 ≤ X ≤ 8) for such a random variable X? That is, what if you were allowed to guess a range instead of a single number? As you might expect, size of the required intervalsize of the total interval = 103 since the random number is uniformly distributed.

Suppose we want to study the set of possible heights (in feet) a person can have, supposing that the range of possible heights is the interval [1, 8].

  • What is the probability that someone has a height of 5.2311333 feet? This is again 0, since you have to be exactly precise!
  • What is the probability that someone has a height between 5 and 6 feet? This is non-zero, since we are studying an interval. It isn’t necessarily 68 −−^51 = 17 though since heights aren’t necessarily uniformly distributed! More people will have heights in the interval [4, 6] feet than say [1, 3] feet.

Notice, that since these values can have infinite precision, the probability that a variable has a specific value is 0, in contrast to discrete random variables.

2 Probability & Statistics with Applications to Computing 4.

4.1.1 Probability Density Functions (PDFs)

Every continuous random variable has a probability density function (PDF), instead of a probability mass function (PMF), that defines the relative likelihood that a random variable X has a particular value. Why do we need this new construct? We already said that P (X = a) = 0 for any value of a, and so a “PMF” for a continuous random variable would equal 0 for any input and be useless. It wouldn’t satisfy the constraint that the sum of the probabilities is 1 (assuming we could even sum over uncountably many values; we can’t). Instead, we have the idea of a probability density function where the x-axis has values in the random variable’s range (usually an interval), and the y-axis has the probability density (not mass), which is explained below.

A PDF may look something like this:

The probability density function fX has some characteristic properties (denoted with fX to distinguish from PMFs pX ). Notice again I will use different dummy variables inside the function like fX (z) or fX (t) to ensure you get the idea that the density is fX (subscript indicates for rv X) and the dummy variable can be anything.

  • fX (z) ≥ 0 for all z ∈ R; i.e., it is always non-negative, just like a probability mass function.

−∞ fX^ (t)dt^ = 1; i.e., the area under the entire curve is equal to 1, just like the sum of all the proba- bilities of a discrete random variable equals 1.

  • P (a ≤ X ≤ b) =

∫ (^) b a fX^ (w)dw; i.e., the probability that^ X^ lies in the interval^ a^ to^ b^ is the area under the curve from a to b. This is key - integrating fX gives us probabilities.

4 Probability & Statistics with Applications to Computing 4.

we see their ratios. Since the density is twice as high at u as it is at v, it means we are twice as likely to get a point “near” u as we are to get one “near” v.

Definition 4.1.2: Probability Density Function (PDF)

Let X be a continuous random variable (one whose range is typically an interval or union of intervals). The probability density function (PDF) of X is the function fX : R → R, such that the following properties hold:

  • fX (z) ≥ 0 for all z ∈ R

−∞ fX^ (t)^ dt^ = 1

  • P (a ≤ X ≤ b) =

∫ (^) b a fX^ (w)^ dw

  • P (X = y) = 0 for any y ∈ R
  • The probability that X is close to q is proportional to its density fX (q);

P (X ≈ q) = P

q −

ε 2 ≤ X ≤ q +

ε 2

≈ εfX (q)

  • Ratios of probabilities of being “near points” are maintained;

P (X ≈ u) P (X ≈ v)

εfX (u) εfX (v)

fX (u) fX (v)

4.1.2 Cumulative Distribution Functions (CDFs)

Here is the density function of a “uniform” random variable on the interval [0, 1]:

4.1 Probability & Statistics with Applications to Computing 5

We know this is valid, because the area under the curve is the area of a square with side lengths 1, which is 1 · 1 = 1.

We define the cumulative distribution function (CDF) of X to be FX (w) = P (X ≤ w). That is, the all the area to the left of w in the density function. Note we also have CDFs for discrete random variables, they are defined exactly the same way (the probability of being less than or equal to a certain value)! They just don’t usually have a nice closed form like they do for continuous RVs. Note for continuous random variables, the CDF at w is just the cumulative area to the left of w, which can be found by an integral (the dummy variable of integration should be different than the input variable w)

FX (w) = P (X ≤ w) =

∫ (^) w

−∞

fX (y)dy

Let’s try to compute the CDF of this uniform random variable on [0, 1]. There are three cases to consider here.

  • If w < 0, FX (w) = 0 since ΩX = [0, 1]. For example, if w = −1, then FX (w) = P (X ≤ −1) = 0 since there is no chance that X ≤ −1. Formally, there is also no area to the left of w = −1 as you can see from the PDF above, so the integral evaluates to 0!
  • If 0 ≤ w ≤ 1, the area up to w is a rectangle of height 1 and width w (see below), so FX (w) = w. That is, P (X ≤ w) = w. For example, if w = 0.5, then the probability X ≤ 0 .5 is actually just 0 .5 since X is just equally likely to be anywhere in ΩX = [0, 1]! Note here we didn’t do an integral since there are nice shapes, and we sometimes don’t have to! We just looked at the area to the left of w.
  • If w > 1, all the area is up to the left of w, so FX (w) = 1. Again, since ΩX = [0, 1] and suppose w = 2, then FX (w) = P (X ≤ 2) = 1 since X is always between 0 and 1 (X must be less than or equal to 2). Formally, the cumulative area to the left of w = 2 is 1 (just the area of the square)!

4.1 Probability & Statistics with Applications to Computing 7

Definition 4.1.3: Cumulative Distribution Function (CDF)

Let X be a continuous random variable (one whose range is typically an interval or union of intervals). The cumulative distribution function (CDF) of X is the function FX : R → R such that:

  • FX (t) = P (X ≤ t) =

∫ (^) t −∞ fX^ (w)^ dw^ for all^ t^ ∈^ R

  • (^) dud FX (u) = fX (u)
  • P (a ≤ X ≤ b) = FX (b) − FX (a)
  • FX is monotone increasing, since fX ≥ 0. That is, FX (c) ≤ FX (d) for c ≤ d.
  • limv→−∞ FX (v) = P (X ≤ −∞) = 0
  • limv→+∞ FX (v) = P (X ≤ +∞) = 1

Example(s)

Suppose the number of hours that a package gets delivered past noon is modelled by the following PDF:

fX (x) =

x/ 10 0 ≤ x ≤ 2 c 2 < x ≤ 6 0 otherwise Here is a graph of the PDF as described above:

  1. What is the range ΩX?
  2. What is the value of c that makes fX a valid density function?
  3. Find the cumulative distribution function (CDF) of X, FX (x), and make sure to define it piecewise for any real number x.

8 Probability & Statistics with Applications to Computing 4.

  1. What is the probability that the delivery arrives between 2pm and 6pm?
  2. What is the expected time that the package arrives at?

Solution

  1. The range is all values where the density is nonzero; in our case, that is ΩX = [0, 6] (or (0, 6)), but we don’t care about single points or endpoints because the probability of being exactly that value is 0.
  2. Formally, we need the density function to integrate to 1; that is, ∫ (^) ∞

−∞

fX (x)dx = 1

But, the density function is split into three parts, we can split our integral into three. However, anywhere the density is zero, we will get an integral of zero, so we’ll only set up the two integrals that are nontrivial: (^) ∫ 2

0

x/ 10 dx +

2

cdx = 1

Solving this equation for c would definitely work. But let’s try to use geometry instead, as we do know how to compute the area of a triangle and rectangle. So the left integral is the area of the triangle with base from 0 to 2 and height c, so that area is 2c/2 = c (the area of a triangle is b · h/2). The area of the rectangle with base from 2 to 6 is 4c. We need the total area of c + 4c = 1, so c = 1/5.

  1. Our CDF needs four cases: when x < 0, when 0 ≤ x ≤ 2, when 2 < x ≤ 6, and when x > 6.

(a) The outer cases are usually the easiest ones: if x < 0, then FX (x) = P (X ≤ x) = 0 since X cannot be less than zero. (b) If x > 6, then FX (x) = P (X ≤ x) = 1 since X is guaranteed to be at most 6. (c) For 0 ≤ x ≤ 2, we need the cumulative area to the left of x, which happens to be a triangle with base x and height x/10, so the area is x^2 /20. Alternatively, evaluate the integral

FX (x) =

∫ (^) x

−∞

fX (t)dt =

∫ (^) x

0

t/ 10 dt = t^2 / 20

(d) For 2 < x ≤ 6, we have the entire triangle of area 2 · 1 / 5 · 0 .5 = 1/5, but also a rectangle of base x − 2 and height 1/5, for a total area of 1/5 + 1/5(x − 2) = x/ 5 − 1 /5. Alternatively, the integral would be FX (x) =

∫ (^) x

−∞

fX (t)dt =

0

t/ 10 dt +

∫ (^) x

2

1 / 5 dt = x/ 5 − 1 / 5

Again, I skipped all the integral evaluation steps as they are purely computational, but feel free to verify!

Finally, putting this together gives

FX (x) =

0 x < 0 x^2 / 20 0 ≤ x ≤ 2 x/ 5 − 1 / 5 2 < x ≤ 6 1 x > 6

10 Probability & Statistics with Applications to Computing 4.

Solution: We need the total area under the curve to be 1, so

−∞

fX (y)dy =

0

cy^2 dy = c

[

y^3

] 9

0

= c

= 243c

Hence, c = 2431. The expected value is the weighted average of each point weighted by its density, so

E [X] =

−∞

zfX (z)dz =

0

z

z^2 dz =

0

z^3 dz

Similarly, by LOTUS,

E

[

X^2

]

−∞

z^2 fX (z)dz =

0

z^2

z^2 dz =

0

z^4 dz

Finally, we can set Var (X) = E

[

X^2

]

− E [X]^2

For the CDF, we know that

FX (t) = P (X ≤ t) =

∫ (^) t

−∞

fX (y)dy

We actually have three cases, similar to the example earlier. If t < 0, FX (t) = 0 since there’s no way to get a negative number (the range is ΩX = [0, 9]). If t > 9, FX (t) = 1 since we are guaranteed to get a number less than t. And for 0 ≤ t ≤ 9, we just do a normal integral to get that

FX (t) = P (X ≤ t) =

∫ (^) t

−∞

fX (s)ds =

−∞

fX (s)ds +

∫ (^) t

0

fX (s)ds = 0 +

∫ (^) t

0

cs^2 ds = c 3

t^3

Putting this together gives:

FX (t) =

0 t < 0 c 3 t

(^3 0) ≤ t ≤ 9 1 t > 9

  1. Suppose X is continuous with pdf

fX (x) =

c x^2 1 ≤^ x^ ≤ ∞ 0 otherwise

Write an expression for the value of c that makes X a valid pdf, and set up expressions (integrals) for its mean and variance. Also, find the cdf of X, FX.

Solution: We need the total area under the curve to be 1, so

−∞

fX (y)dy =

1

c y^2

dy = −c

[

y

]∞

1

= −c(0 − 1) = c

Hence, c = 1. The expected value is the weighted average of each point weighted by its density, so

E [X] =

−∞

zfX (z)dz =

1

z ·

z^2

dz =

1

z

dz = [ln(z)]∞ 1 = ∞

4.1 Probability & Statistics with Applications to Computing 11

Actually, the mean and variance are undefined (since they are infinite)! If the integral for E [X] did not converge, then the integral for E

[

X^2

]

had no chance either (try it)! For the cdf, we know that

FX (t) = P (X ≤ t) =

∫ (^) t

−∞

fX (y)dy

We actually have two cases. If t < 1, FX (t) = 0 since there’s no way to get a number less than 1 (the range is ΩX = [1, ∞)). For t > 1, we just do a normal integral to get that

FX (t) = P (X ≤ t) =

∫ (^) t

−∞

fX (s)ds =

−∞

fX (s)ds+

∫ (^) t

1

fX (s)ds =

∫ (^) t

1

s^2 ds = −

[

s

]t

1

t

t

Putting this together gives:

FX (t) =

0 t < 1 1 − (^1) t t ≥ 1