Discrete Random Variables: Definition, Probability Mass Function, and Expectation, Study notes of Statistics

Discrete random variables, their definition, probability mass function, and expectation. It includes examples of discrete random variables, such as the number of heads in coin flips, and calculates their probability mass functions and expectations. The document also explains the concept of independent events and how it applies to discrete random variables.

Typology: Study notes

2021/2022

Uploaded on 09/27/2022

fazal
fazal 🇺🇸

4.6

(12)

230 documents

1 / 10

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Chapter 3. Discrete Random Variables
3.1: Discrete Random Variables Basics
(From “Probability & Statistics with Applications to Computing” by Alex Tsun)
3.1.1 Introduction to Discrete Random Variables
Suppose you flip a fair coin twice. Then the sample space is:
= {HH , HT , TH , TT }
Sometimes, though, we don’t care about the order (HT vs T H ), but just the fact that we got one heads and
one tail. So we can define a random variable as a numeric function of the outcome.
For example, we can define Xto be the number of heads in the two independent flips of a fair coin. Then
Xis a function, X: Rwhich takes outcomes ω and maps them to a number. For example, for the
outcome HH , we have X(HH) = 2 since there are two heads. See the rest below!
Xis an example of a random variable, which brings us to the following definition:
Definition 3.1.1: Random Variable
Suppose we conduct an experiment with sample space Ω. A random variable (rv) is a numeric
function of the outcome, X: R. That is, it maps outcomes (ωΩ) to numbers: ω7→ X(ω).
The set of possible values Xcan take on is its range/support, denoted X.
If Xis finite or countably infinite (typically integers or a subset), Xis a discrete random variable
(drv). Else if Xis uncountably large (the size of real numbers), Xis a continuous random
variable.
1
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Discrete Random Variables: Definition, Probability Mass Function, and Expectation and more Study notes Statistics in PDF only on Docsity!

Chapter 3. Discrete Random Variables

3.1: Discrete Random Variables Basics

(From “Probability & Statistics with Applications to Computing” by Alex Tsun)

3.1.1 Introduction to Discrete Random Variables

Suppose you flip a fair coin twice. Then the sample space is:

Ω = {HH, HT, T H, T T }

Sometimes, though, we don’t care about the order (HT vs T H), but just the fact that we got one heads and one tail. So we can define a random variable as a numeric function of the outcome.

For example, we can define X to be the number of heads in the two independent flips of a fair coin. Then X is a function, X : Ω → R which takes outcomes ω ∈ Ω and maps them to a number. For example, for the outcome HH, we have X(HH) = 2 since there are two heads. See the rest below!

X is an example of a random variable, which brings us to the following definition:

Definition 3.1.1: Random Variable

Suppose we conduct an experiment with sample space Ω. A random variable (rv) is a numeric function of the outcome, X : Ω → R. That is, it maps outcomes (ω ∈ Ω) to numbers: ω 7 → X(ω). The set of possible values X can take on is its range/support, denoted ΩX. If ΩX is finite or countably infinite (typically integers or a subset), X is a discrete random variable (drv). Else if ΩX is uncountably large (the size of real numbers), X is a continuous random variable.

2 Probability & Statistics with Applications to Computing 3.

Example(s)

Below are some descriptions of random variables. Find their ranges and classify them as a discrete random variable (DRV) or continuous random variable (CRV). The first row is filled out for you as an example!

RV Description Range DRV or CRV? X, the # of heads in n flips of a fair coin { 0 , 1 ,... , n} DRV N , the # of people born this year. TODO TODO F , the # of flips of a fair coin up to and including my first head. TODO TODO B, the amount of time I wait for the next bus in seconds. TODO TODO C, the temperature in Celsius of liquid water TODO TODO

Solution Here is the solution in a table, with explanations below.

RV Description Range DRV or CRV? X, the # of heads in n flips of a fair coin { 0 , 1 ,... , n} DRV N , the # of people born this year. { 0 , 1 , 2 ,... } DRV F , the # of flips of a fair coin up to and including my first head. { 1 , 2 ,... , } DRV B, the amount of time I wait for the next bus in seconds. [0, ∞) CRV C, the temperature in Celsius of liquid water (0, 100) CRV

  • The range of X is ΩX = { 0 , 1 ,... , n} because there could be any where from 0 to n heads flipped. It is a discrete random variable because there are finite n + 1 values that it takes on.
  • The range of N is ΩN = { 0 , 1 , 2... } because there is no upper bound on the number of people that can be born. This is countably infinite as it is a subset of all the integers, so it is a discrete random variable.
  • The range of F is ΩF = { 1 , 2 ,... } because it will take at least 1 flip to flip a head or it could always be tails and never flip a head (although the chance is low). This is still countable as a subset of all the integers, so it is a discrete random variable.
  • The range of B is ΩB = [0, ∞), as there could be partial seconds waited, and it could be anywhere from 0 seconds to a bus never coming. This is a continuous random variable because there are uncountably many values in this range.
  • The range of C is ΩC = (0, 100) because the temperature can be any real number in this range. It cannot be 0 or below because that would be frozen (ice), nor can it be 100 or above because this would be boiling (steam). This is a continuous random variable.

3.1.2 Probability Mass Functions

Let’s return to X which we defined to be the number of heads in the flip of two fair coins. We already determined that Ω = Ω = {HH, HT, T H, T T } and X(HH) = 2, X(HT ) = 1, X(T H) = 1 and X(T T ) = 0. The range, ΩX , is { 0 , 1 , 2 }.

4 Probability & Statistics with Applications to Computing 3.

must choose 9 as one of the balls, and the other 2 from 1-8. So we have

pY (k) =

(k− 1 2

3

) , k ∈ ΩY

In our next example, we will briefly introduce one more concept called the CDF of a random variable. We will discuss it a lot more in depth in 4.1 when we discuss continuous RVs!

Example(s)

Suppose there are three students, and their hats are returned randomly with each of the 3! permuta- tions equally likely. Let X be the number of hats returned to the correct owner.

  1. List out all 3! = 6 elements Ω, the sample space of the experiment, as permutations of the numbers 1,2, and 3.
  2. Find the range ΩX (be careful) and PMF pX.
  3. The cumulative distribution function (CDF) of a random variable X is defined to be FX : R → [0, 1] such that FX (t) = P (X ≤ t) (again, t is a dummy letter and we could have chosen any). Find the CDF FX.

Solution

  1. The sample space is Ω = { 123 , 132 , 213 , 231 , 312 , 321 }. For example, 123 means that everyone got their own hat back, and 321 means only person 2 got their own hat back.
  2. We construct the following table with 6 rows: one for each outcome ω.

ω X(ω) P (ω) Explanation 123 3 1 / 6 All 3 people got their hat back. 132 1 1 / 6 Only person 1 got their hat back. 213 1 1 / 6 Only person 3 got their hat back. 231 0 1 / 6 No one got their hat back. 312 0 1 / 6 No one got their hat back. 321 1 1 / 6 Only person 2 got their hat back.

Note that it isn’t possible for X to equal 2: if 2 people out of 3 have their hat back, then the third person must also have their own hat! So ΩX = { 0 , 1 , 3 }. Let’s work on each:

  • pX (0) = P (X = 0) =

ω∈Ω:X(ω)=0 P^ (ω) =^ P^ (231) +^ P^ (312) =^

1 6 +^

1 6 =^

2

  • pX (1) = P (X = 1) =

ω∈Ω:X(ω)=1 P^ (ω) =^ P^ (132) +^ P^ (213) +^ P^ (321) =^

1 6 +^

1 6 +^

1 6 =^

3

  • pX (3) = P (X = 3) =

ω∈Ω:X(ω)=3 P^ (ω) =^ P^ (123) =^

1

So our final PMF is:

pX (k) =

2 / 6 k = 0 3 / 6 k = 1 1 / 6 k = 3

3.1 Probability & Statistics with Applications to Computing 5

  1. Notice that the CDF is defined for ALL real numbers R = (−∞, +∞), unlike PMF’s. So we’ll have to specify FX (t) = P (X ≤ t) for t that are not even in the range ΩX , including decimal numbers! This sounds nearly impossible, but it’s actually not too bad! Let’s start by seeing some example values. - FX (− 3 .642) = P (X ≤ − 3 .642) = 0 because there is no way that X ≤ − 3 .642. In fact, FX (t) for any t < 0 is precisely 0 since the lowest possible value of X is 0. - FX (0.724) = P (X ≤ 0 .724) = P (X = 0) = 2/6 because the only way that X ≤ 0 .724 is if X = 0, which happens with probability 2/6. In fact, FX (t) = 2/6 for any 0 ≤ t < 1 for this reason! - FX (2.999) = P (X ≤ 2 .999) = P (X = 0) + P (X = 1) = 2/6 + 3/6 = 5/6 because X ≤ 2 .999 only if X = 0 or X = 1. And again, for any 1 ≤ t < 3, we have FX (t) = 5/6. - Finally, FX (235.23) = P (X ≤ 235 .23) = P (X = 0) + P (X = 1) + P (X = 3) = 1 because X must be in its range ΩX = { 0 , 1 , 3 }. It is guaranteed that X ≤ 235 .23, and any t ≥ 3. Therefore, FX (t) = 1 for any t ≥ 3.

Putting this all together gives:

FX (t) =

0 t < 0 2 / 6 0 ≤ t < 1 5 / 6 1 ≤ t < 3 1 t ≥ 3 See the picture below for a plot of the PMF and CDF!

You’ll notice the CDF is always between 0 and 1 because it is a probability! It is always increasing as well, since we are only adding more and more cumulative probabilities (which are nonnegative). Notice at the jumps of the CDF, the vertical distance is just the PMF (why?)! Again, we’ll talk more about CDFs in 4.1, so treat this as foreshadowing!

3.1.3 Expectation

We have this idea of a random variable, which is actually neither random nor a variable (it’s a deterministic function X : Ω → ΩX .) However, the way I like to think about it is: it a random quantity which we do

3.1 Probability & Statistics with Applications to Computing 7

the correct owner.” The range was ΩX = { 0 , 1 , 3 } and PMF was

pX (k) =

2 / 6 k = 0 3 / 6 k = 1 1 / 6 k = 3

Find the expected number of people who get their hat back, E [X].

Solution Typically, the second definition of expectation is easier to use since it has less terms to sum over. We take the sum of each value in ΩX multiplied by its probability.

E [X] =

k∈{ 0 , 1 , 3 }

k · pX (k) = 0 ·

That is, if we return 3 hats randomly to the 3 students, we expect on average that 1 student will get their own hat back. It turns out that, no matter how many students/hats there are, the answer is always 1; how amazing! We’ll actually show this amazing fact in section 3.3, so stay tuned!

Example(s)

There are 3 people in Linbo’s family; his mom, dad, and sister. Each family member decides whether or not they want to come to lunch in his social-distancing home restaurant, independently of the others.

  • Mom wants to come with probability 0.8.
  • Dad wants to come with probability 0.6.
  • Sister wants to come with probability 0.1.

Unfortunately, if all 3 of them want to come, he must turn one of them away since the restaurant capacity is 2 guests. Otherwise, he will take everyone that comes. Let X be the number of customers that Linbo serves at lunch.

  1. What is the range ΩX , the PMF pX (k) and expectation E [X]?
  2. If he charges everyone who comes $10, but it costs him $50 to make all the food, what is his expected profit (this could be negative)?

Solution

  1. The range is ΩX = { 0 , 1 , 2 } since we can have anywhere from 0 to 2 people. Let M, D, S be the events that his mom, dad, and sister want to come, respectively. By independence, the probability no one comes is:

pX (0) = P (X = 0) = P

M C^ , DC^ , SC^

= P

M C^

P

DC^

P

SC^

The probability that exactly one person comes has three cases: only mom comes, only dad comes, or only sister comes:

pX (1) = P (X = 1) = P

M, DC^ , SC^

+ P

M C^ , D, SC^

+ P

M C^ , DC^ , S

8 Probability & Statistics with Applications to Computing 3.

Finally, for pX (2), we have some work to do. We can sum over the three cases where exactly 2 of the 3 want to come. But if all 3 want to come (P (M, D, S)), this also counts as X = 2 since we turn one of them away! So we actually add 4 probabilities to get pX (2). Alternatively, we know that these three probabilities must sum to 1: pX (0) + pX (1) + pX (2) = 1, and hence using our previous computations:

pX (2) = 1 − pX (0) − pX (1) = 1 − 0. 072 − 0 .404 = 0. 524

So our PMF is:

pX (k) =

  1. 072 k = 0
  2. 404 k = 1
  3. 524 k = 2

The expectation is

E [X] =

k∈ΩX

k · pX (k) = 0 · 0 .072 + 1 · 0 .404 + 2 · 0 .524 = 1. 452

So we expect 1.452 people to come!

  1. We’d intuitively like to say something like: the profit is P = 10X − 50, so

E [P ] = E [10X − 50] = 10E [X] − 50 = 14. 52 − 50 = − 35. 48

But is this step valid: E [10X − 50] = 10E [X] − 50? Yes, and it is called linearity of expectation! This is one of the most important theorems on expectation, and is covered in the next section.

The “proper” way to do this expectation right now is to start over and find the range, PMF, and expectation of P. That is, ΩP = {− 50 , − 40 , − 30 } since these are the possible profits if 0, 1, or 2 people came. Then,

pP (k) =

  1. 072 k = − 50
  2. 404 k = − 40
  3. 524 k = − 30 You can check now that computing expectation using the usual formula gives the same answer!

3.1.4 Exercises

  1. Let X be the value of single roll of a fair six-sided dice. What is the range ΩX , the PMF pX (k), and the expectation E [X]?

Solution: The range is ΩX = { 1 , 2 , 3 , 4 , 5 , 6 }. The PMF is

pX (k) =

, k ∈ ΩX

The expectation is

E [X] =

k∈ΩX

k · pX (k) = 1 ·

This kind of makes sense right? You expect the “middle number” between 1 and 6, which is 3.5.

10 Probability & Statistics with Applications to Computing 3.

10? And if p = 1/7, maybe 7? So seems like our guess will be E [X] = (^1) p. It turns out this intuition is actually correct!

E [X] =

k∈ΩX

k · pX (k) [def of expectation]

∑^ ∞

k=

k(1 − p)k−^1 p

= p

∑^ ∞

k=

k(1 − p)k−^1 [p is a constant with respect to k ]

= p

∑^ ∞

k=

d dp

(−(1 − p)k)

[

d dy

yk^ = kyk−^1

]

= −p

d dp

∑^ ∞

k=

(1 − p)k−^1

[swap sum and integral]

= −p

d dp

1 − (1 − p)

) [

geometric series formula:

∑^ ∞

i=

ri^ =

1 − r

]

= −p

d dp

p

= −p

p^2

p