Lecture 1 Discrete random variables, Study notes of Reasoning

A large chunk of probability is about random variables. Instead of giving a precise definition, let us just mention that a random variable ...

Typology: Study notes

2022/2023

Uploaded on 02/28/2023

salim
salim 🇺🇸

4.4

(24)

242 documents

1 / 15

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Lecture 1: Discrete random variables 1of 15
Course: Introduction to Stochastic Processes
Term: Fall 2019
Instructor: Gordan Žitkovi´c
Lecture 1
Discrete random variables
1.1Random Variables
A large chunk of probability is about random variables. Instead of giving a
precise definition, let us just mention that a random variable can be thought
of as an uncertain quantity (usually numerical, i.e., with values in the set of
real numbers R, but not always).
While it is true that we do not know with certainty what value a random
variable Xwill take, we usually know how to assign a number - the proba-
bility - that its value will be in some1subset of R. For example, we might be
interested in P[X7],P[X[2, 3.1]] or P[X {1, 2, 3}].
Random variables are usually divided into discrete and continuous, even
though there exist random variables which are neither discrete nor continu-
ous. Those can be safely neglected for the purposes of this course, but they
play an important role in many areas of probability and statistics.
1.2Discrete random variables
Before we define discrete random variables, we need some vocabulary.
Definition 1.2.1.Given a set B, we say that the random variable Xis
B-valued if P[XB] = 1.
In words, Xis B-valued if we know for a fact that Xwill never take a
value outside of B.
Definition 1.2.2.A random variable is said to be discrete if there exists
a set Ssuch that Sis either finite or countableaand Xis S-valued.
aCountable means that its elements can be enumerated by the natural numbers. The
only (infinite) countable sets we will need are N={1,2, . . . , }or N0={0, 1, 2, . . . }.
1We will not worry about measurability and similar subtleties in this class.
Last Updated: September 25,2019
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Lecture 1 Discrete random variables and more Study notes Reasoning in PDF only on Docsity!

Course: Introduction to Stochastic Processes Term: Fall 2019 Instructor: Gordan Žitkovi´c

Lecture 1

Discrete random variables

1. 1 Random Variables

A large chunk of probability is about random variables. Instead of giving a precise definition, let us just mention that a random variable can be thought of as an uncertain quantity (usually numerical, i.e., with values in the set of real numbers R , but not always).

While it is true that we do not know with certainty what value a random variable X will take, we usually know how to assign a number - the proba- bility - that its value will be in some^1 subset of R. For example, we might be interested in P [X ≥ 7 ], P [X ∈ [2, 3.1]] or P [X ∈ {1, 2, 3}].

Random variables are usually divided into discrete and continuous , even though there exist random variables which are neither discrete nor continu- ous. Those can be safely neglected for the purposes of this course, but they play an important role in many areas of probability and statistics.

1. 2 Discrete random variables

Before we define discrete random variables, we need some vocabulary.

Definition 1. 2. 1. Given a set B, we say that the random variable X is B -valued if P [X ∈ B] = 1.

In words, X is B-valued if we know for a fact that X will never take a value outside of B.

Definition 1. 2. 2. A random variable is said to be discrete if there exists a set S such that S is either finite or countablea^ and X is S-valued. aCountable means that its elements can be enumerated by the natural numbers. The only (infinite) countable sets we will need are N = {1, 2,... , } or N 0 = {0, 1, 2,... }.

(^1) We will not worry about measurability and similar subtleties in this class.

Definition 1. 2. 3. The support SX of the discrete random variable X is the smallest set S such that X is S-valued.

Example 1. 2. 4. A die is thrown and the number obtained is recorded and denoted by X. The possible values of X are {1, 2, 3, 4, 5, 6} and each happens with probability 1 / 6 , so X is certainly S-valued. Since S is finite, X is discrete. One still needs to argue that S is the support SX of X. The alternative would be that SX is a proper subset of S, i.e., that there are redundand elements in S. This is not the case since all elements in S are “impor- tant”, i.e., happen with positive probability. If we remove anything from S, we are omitting a possible value for X. On the other hand, it is certainly true that X always takes its values in the finite set S′^ = {1, 2, 3, 4, 5, 6, 7 }, i.e., that X is S′-valued. One has to be careful with the terminology here: it is correct to say that X is an S′-valued (or even N -valued) random variable, even though it only takes the values 1, 2,... , 6 with positive probabilities.

Discrete random variables are very nice due to the following fact: in or- der to be able to compute any conceivable probability involving a discrete random variable X, it is enough to know how to compute the probabili- ties P [X = x], for all x ∈ S. Indeed, if you are interested in figuring out what P [X ∈ B] is, for some set B ⊆ R (e.g., B = {5, 6, 7}, B = [3, 6], or B = [−2, ∞)), we simply pick all x ∈ SX which are also in B and sum their probabilities. In mathematical notation, we have

P [X ∈ B] = ∑

x∈SX ∩B

P [X = x]. ( 1. 2. 1 )

Definition 1. 2. 5. The probability mass function (pmf) of a discrete random variable X is a function pX defined on the support SX of X by

pX (x) = P [X = x], x ∈ SX.

In practice, we usually present the pmf pX in the form of a table (called the distribution table ) as

X ∼

x x 1 x 2 x 3... pX (x) p 1 p 2 p 3...

or, simply,

X ∼ x 1 x 2 x 3... p 1 p 2 p 3...

difference is that the result is not a number anymore. The set S of all possible values can be represented as the set of all pairs like (♠, 7), where the first entry denotes the picked card’s suit (in {♥, ♠, ♣, ♦}), and the second is a number between 1 and 13. It is, of course, possible to use different conventions and use the set {2, 3,... , 9, 10, J, Q, K, A} for the second component. The point is that the values X takes are not numbers.

1. 3 Events and Bernoulli random variables

Random variables X which can only take one of two values 0, 1, i.e., for which SX ⊆ {0, 1}, are called indicators or Bernoulli random variables and are very useful in probability and statistics (and elsewhere). The name comes from the fact that you should think of such variables as signal lights; if X = 1 an event of interest has happened, and if X = 0 it has not happened. In other words, X indicates the occurence of an event. One reason the Bernoulli random variables are so useful is that they let us manipulate events without ever leaving the language of random variables. Here is an example:

Example 1. 3. 1. Suppose that two dice are thrown so that X 1 and X 2 are the numbers obtained (both X 1 and X 2 are discrete random variables with SX 1 = SX 2 = {1, 2, 3, 4, 5, 6}). If we are interested in the probabil- ity that their sum is at least 9, we proceed as follows. We define the random variable W - the sum of X 1 and X 2 - by W = X 1 + X 2. An- other random variable, let us call it X, is a Bernoulli random variable defined by

X =

1, W ≥ 9,

0, W < 9.

With such a set-up, X signals whether the event of interest has hap- pened, and we can state our original problem in terms of X, namely “Compute P [X = 1 ] !”.

This example is, admittedly, a little contrived. The point, however, is that anything can be phrased in terms of random variables; thus, if you know how to work with random variables, i.e., know how to compute their distributions, you can solve any problem in probability that comes your way. Another reason Bernoulli random variables are useful is the fact that we can do arithmetic with them.

Example 1. 3. 2. 70 coins are tossed and their outcomes are denoted by W 1 , W 2 ,... , W 70. All Wi are random variables with values in {H, T} (and therefore not Bernoulli random variables), but they can be easily recoded into Bernoulli random variables as follows:

Xi =

1, if Wi = H, 0, if Wi = T.

Once you have the “dictionary” { 1 ↔ H, 0 ↔ T}, random variables Xi and Wi carry exactly the same information. The advantage of using Xi is that the random variable

N =

70

i= 1

Xi,

which takes values in SN = {0, 1, 2,... , 70} counts the number of heads among W 1 ,... , W 70. Similarly, the random variable

M = X 1 × X 2 × · · · × X 70

is a Bernoulli random variable itself. What event does it indicate?

1. 4 Some widely used discrete random variables

The distribution of a random variable is sometimes defined as “the collection of all possible probabilities associated to it”. This sounds a bit abstract, and, at least in the discrete case, obscures the practical significance of this impor- tant concept. We have learned that for discrete variables the knowledge of the pmf or the distribution table (such as the one in part 1 ., 2. or 3. of Example

    1. 6 ) amounts to the knowledge of the whole distribution. It turns out that many random variables in widely different contexts come with the same (or similar) distribution tables, and that some of those appear so often that they deserve to be named (so that we don’t have to write the distribution table every time). The following example lists some of those, named, distribution. There are many others, but we will not need them in these notes.

Example 1. 4. 1.

  1. Bernoulli distribution. We have already encountered this distri- bution in our discussion of indicator random variables above. It is characterized by the distribution table of the form 0 1 1 − p p

Recall that the binomial distribution arises as the “number of suc- cesses in n independent Bernoulli trials”, i.e., it counts the number of H in n independent tosses of a biased coin whose probability of H is p.

  1. Geometric distribution. The geometric distribution is similar to the binomial in that it is built out of the sequence of “successes” and “failures” in independent, repeated Bernoulli trials. The dif- ference is that the number of trials is no longer fixed (i.e., = n), but we keep tossing until we get our first success. Since the trials are independent, if the probability of success in each trial is p ∈ (0, 1), the probability that it will take exactly k failures before the first success is qk^ p, where q = 1 − p. Therefore, so the geometric distri- bution - denoted by g(p) - comes with the following table

0 1 2 3... p qp q^2 p q^3 p...

● (^) ● ● (^) ● (^) ● (^) ● ● (^) ● ● 0 1 2 y

Figure 2. The probability mass function (pmf) of a typical geometric distribution.

Caveat: When defining the geometric distribution, some people count the number of trials to the first success, i.e., add the final success into the count. This shifts everything by 1 and leads to a distribution with support N (and not N 0 ). While this is no big deal, this ambiguity tends to be confusing at times and leads to bugs in software. For us, the geometric distribution will always start from 0. The distri- bution which counts the final success will be referred to as the shifted geometric distribution , but we’ll try to avoid it altogether.

  1. Poisson distribution. This is also a family of distributions, param- eterized by a single parameter λ > 0, and denoted by P( λ ). Its support is N 0 and the distribution table is given by

0 1 2 3 4... e− λ^ e− λλ e− λ λ 2 2 e

λ λ^3 3! e

λ λ^4 4!...

The closed form for the pmf is

pX (x) = e− λ λ x x! ,^ x^ ∈^ N. The Poisson distribution arises as a limit when n → ∞ and p → 0 while np ∼ λ in the Binomial distribution.

● ● ●

● ● ● ● (^) ● (^) ● ● 0 1 2 ●^ ●^ ●^ ●^ ●^ ●^ ●^ ● y

Figure 3. The probability mass function (pmf) of a typical Poisson distribution with λ > 1.

The fundamental properties of the variance/standard deviation are given in the following theorem^3 :

Theorem 1. 5. 4. Suppose that X and Y are random variables and that α is a constant. Then

1. Var[ α X] = α^2 Var[X], and

2. if, additionally, X and X are independent, then

Var[X + Y] = Var[X] + Var[Y].

Caveat: These properties are not the same as the properties of the ex- pectation. First of all the constant comes out of the variance with a square, and second, the variance of the sum is the sum of the indi- vidual variances only if additional assumptions, such as the indepen- dence between the two variables, are imposed.

Finally, here is a very useful alternative formula for the variance of a random variable:

Proposition 1. 5. 5. Var[X] = E [X^2 ] − ( E [X])^2.

Let us compute expectations and variances/standard deviations for our most important examples.

Example 1. 5. 6.

  1. Bernoulli distribution. Let X ∼ B(p) be a Bernoulli random vari- able with parameter p. Then (remember q is a shortcut for 1 − p)

E [X] = 0 × q + 1 × p = p.

Using ( 1. 5. 5 ), we get

Var[X] = E [X^2 ] − ( E [X])^2 = 02 × q + 12 × p − p^2 = p − p^2 = p( 1 − p) = pq,

and, so, sd[X] =

pq.

(^3) we will talk about independence in detail in the next lecture. An intuitive understanding should be fine for now.

  1. Binomial distribution. Moving on to the binomial, X ∼ b(n, p), we could either use the formula ( 1. 5. 1 ) and try to evaluate the sum

E [X] =

n

k= 0

k

n k

pkqn−k,

or use some of the properties of the expectation of Theorem 1. 5. 2. To do the latter, we remember that the distribution of a binomial is the same as the distribution of a sum of n (independent) Bernoul- lies. So if we write X = X 1 + · · · + Xn, and each X 1... Xn has the B(p) distribution, Theorem 1. 5. 2 yields

E [X] = E [X 1 ] + E [X 2 ] + · · · + E [Xn] = np. ( 1. 5. 3 )

A similar simplification can be achieved in the computation of the variance, too. While it was unimportant in that X 1 ,... , Xn are in- dependent in ( 1. 5. 3 ), it is crucial for Theorem 1. 5. 4 :

Var[X] = Var[X 1 ] + · · · + Var[Xn] = npq,

and, so, sd[X] =

npq.

  1. Geometric distribution. The trick from 2. above cannot be applied to the geometric random variables. If nothing else, this is because Theorem 1. 5. 2 can only be applied to a given (fixed, nonrandom) number n of random variables. We can still use the definition ( 1. 5. 1 ) and evaluate an infinite sum:

E [X] =

k= 0

kpqk.

Instead of doing that, let us proceed somewhat informally and note that we can think of a geometric random variable as follows:

With probability p our first throw is a success and X =

  1. With probability q our first throw is a failure and we restart the experiment on the second throw, making sure to add the first failure to the count.

Therefore, E [X] = p × 0 + q × ( 1 + E [X]), and, so, E [X] = q/p. Similar reasoning can be applied to obtain

E [X^2 ] = p × 0 + q E [( 1 + X)^2 ] = q + 2 q E [X] + q E [X^2 ] = q + 2 q^2 /p + q E [X^2 ],

which yields Var[X] = E [X^2 ] − ( E [X])^2 = q/p^2 and sd[X] = √q /p.

  1. Y − 5, where Y ∼ g(p) (geometric),

  2. 2Y, where Y ∼ P( λ ) (Poisson).

Problem 1. 6. 4. Let Y denote the number of tosses of a fair die until the first 6 is obtained (if we get a 6 on the first try, Y = 0). The support SY of Y is

(a) {0, 1, 2, 3, 4,... }

(b) {1, 2, 3, 4, 5, 6}

(c) { 16 , 16 , 16 , 16 , 16 }

(d) { 16 , 56 × 16 ,

6

× 16 ,

6

× 16 ,... }

(e) none of the above

Problem 1. 6. 5. The probability that Janet makes a free throw is 0.6. What is the probability that she will make at least 16 out of 23 (independent) throws? Write down the answer as a sum - no need to evaluate it.

Problem 1. 6. 6. Three unbiased and independent coins are tossed. Let Y 1 be the total number of heads on the first two coins, and let Y be the random variable which is equal to Y 1 if the third coin comes up heads and −Y 1 if it comes up tails. Compute Var[Y].

Problem 1. 6. 7. A die is thrown and a coin is tossed independently of it. Let Y be the random variable which is equal to the number on the die in case the coin comes up heads and twice the number on the die if it comes up tails.

  1. What the support of SY of Y? What is its distribution (pmf)?

  2. Compute E [Y] and Var[Y].

Problem 1. 6. 8. n people vote in a general election, with only two candidates running. The vote of person i is denoted by Yi and it can take values 0 and 1, depending which candidate they voted for (we encode one of them as 0 and the other as 1). We assume that votes are independent of each other and that each person votes for candidate 1 with probability p. If the total number of votes for candidate 1 is denoted by Y, then

(a) Y is a geometric random variable

(b) Y^2 is a binomial random variable

(c) Y is uniform on {0, 1,... , n}

(d) Var[Y] ≤ E [Y]

(e) none of the above

Problem 1. 6. 9. A discrete random variable Y is said to have a discrete uni- form distribution on {0, 1, 2,... , n}, denoted by Y ∼ u(n) if its distribution table looks like this:

0 1 2... n 1 n+ 1

1 n+ 1

1 n+ 1...^

1 n+ 1

Compute the expectation and the variance of u(n). You may use the fol- lowing identities: 1 + 2 + · · · + n = 12 n(n + 1 ) and 1^2 + 22 + · · · + n^2 = 1 6 n(n^ +^1 )(^2 n^ +^1 ).

Problem 1. 6. 10. Let X be a Poisson random variable with parameter λ > 0. Compute the following:

  1. P [X ≥ 3 ],

  2. (*) E [X^3 ]. Note: The sum you need to evaluate is quite difficult. If you don’t know the trick, do not worry. If you know how to use symbolic-computation software such as Mathematica, feel free to use it. We will learn how to do this using generating functions later in the class.

Problem 1. 6. 11. Let X be a geometric random variable with parameter p ∈ (0, 1), i.e. X ∼ g(p), and let Y = 2 −X. Write down the (first few entries in) the distribution table of Y. Compute E [Y] = E [ 2 −X^ ].

Problem 1. 6. 12. Let Y 1 and Y 2 be uncorrelated discrete random variables such that Var[ 2 Y 1 − Y 2 ] = 17 and Var[Y 1 + 2 Y 2 ] = 5. Compute Var[Y 1 − Y 2 ]. Note: Y 1 and Y 2 are uncorrelated if E [(Y 1 − E [Y 1 ])(Y 2 − E [Y 2 ])] = 0. (Hint: What is Var[ α Y 2 + β Y 2 ] in terms of Var[Y 1 ] and Var[Y 2 ] when Y 1 and Y 2 are uncorrelated?)

Problem 1. 6. 13. Let Y 1 and Y 2 be uncorrelated random variables such that sd[Y 1 + Y 2 ] = 5. Then sd[Y 1 − Y 2 ] =

(a) 1 (b)

2 (c)

3 (d) 5 (e) not enough information is given