









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
A large chunk of probability is about random variables. Instead of giving a precise definition, let us just mention that a random variable ...
Typology: Study notes
1 / 15
This page cannot be seen from the preview
Don't miss anything!










Course: Introduction to Stochastic Processes Term: Fall 2019 Instructor: Gordan Žitkovi´c
A large chunk of probability is about random variables. Instead of giving a precise definition, let us just mention that a random variable can be thought of as an uncertain quantity (usually numerical, i.e., with values in the set of real numbers R , but not always).
While it is true that we do not know with certainty what value a random variable X will take, we usually know how to assign a number - the proba- bility - that its value will be in some^1 subset of R. For example, we might be interested in P [X ≥ 7 ], P [X ∈ [2, 3.1]] or P [X ∈ {1, 2, 3}].
Random variables are usually divided into discrete and continuous , even though there exist random variables which are neither discrete nor continu- ous. Those can be safely neglected for the purposes of this course, but they play an important role in many areas of probability and statistics.
Before we define discrete random variables, we need some vocabulary.
Definition 1. 2. 1. Given a set B, we say that the random variable X is B -valued if P [X ∈ B] = 1.
In words, X is B-valued if we know for a fact that X will never take a value outside of B.
Definition 1. 2. 2. A random variable is said to be discrete if there exists a set S such that S is either finite or countablea^ and X is S-valued. aCountable means that its elements can be enumerated by the natural numbers. The only (infinite) countable sets we will need are N = {1, 2,... , } or N 0 = {0, 1, 2,... }.
(^1) We will not worry about measurability and similar subtleties in this class.
Definition 1. 2. 3. The support SX of the discrete random variable X is the smallest set S such that X is S-valued.
Example 1. 2. 4. A die is thrown and the number obtained is recorded and denoted by X. The possible values of X are {1, 2, 3, 4, 5, 6} and each happens with probability 1 / 6 , so X is certainly S-valued. Since S is finite, X is discrete. One still needs to argue that S is the support SX of X. The alternative would be that SX is a proper subset of S, i.e., that there are redundand elements in S. This is not the case since all elements in S are “impor- tant”, i.e., happen with positive probability. If we remove anything from S, we are omitting a possible value for X. On the other hand, it is certainly true that X always takes its values in the finite set S′^ = {1, 2, 3, 4, 5, 6, 7 }, i.e., that X is S′-valued. One has to be careful with the terminology here: it is correct to say that X is an S′-valued (or even N -valued) random variable, even though it only takes the values 1, 2,... , 6 with positive probabilities.
Discrete random variables are very nice due to the following fact: in or- der to be able to compute any conceivable probability involving a discrete random variable X, it is enough to know how to compute the probabili- ties P [X = x], for all x ∈ S. Indeed, if you are interested in figuring out what P [X ∈ B] is, for some set B ⊆ R (e.g., B = {5, 6, 7}, B = [3, 6], or B = [−2, ∞)), we simply pick all x ∈ SX which are also in B and sum their probabilities. In mathematical notation, we have
x∈SX ∩B
P [X = x]. ( 1. 2. 1 )
Definition 1. 2. 5. The probability mass function (pmf) of a discrete random variable X is a function pX defined on the support SX of X by
pX (x) = P [X = x], x ∈ SX.
In practice, we usually present the pmf pX in the form of a table (called the distribution table ) as
x x 1 x 2 x 3... pX (x) p 1 p 2 p 3...
or, simply,
X ∼ x 1 x 2 x 3... p 1 p 2 p 3...
difference is that the result is not a number anymore. The set S of all possible values can be represented as the set of all pairs like (♠, 7), where the first entry denotes the picked card’s suit (in {♥, ♠, ♣, ♦}), and the second is a number between 1 and 13. It is, of course, possible to use different conventions and use the set {2, 3,... , 9, 10, J, Q, K, A} for the second component. The point is that the values X takes are not numbers.
Random variables X which can only take one of two values 0, 1, i.e., for which SX ⊆ {0, 1}, are called indicators or Bernoulli random variables and are very useful in probability and statistics (and elsewhere). The name comes from the fact that you should think of such variables as signal lights; if X = 1 an event of interest has happened, and if X = 0 it has not happened. In other words, X indicates the occurence of an event. One reason the Bernoulli random variables are so useful is that they let us manipulate events without ever leaving the language of random variables. Here is an example:
Example 1. 3. 1. Suppose that two dice are thrown so that X 1 and X 2 are the numbers obtained (both X 1 and X 2 are discrete random variables with SX 1 = SX 2 = {1, 2, 3, 4, 5, 6}). If we are interested in the probabil- ity that their sum is at least 9, we proceed as follows. We define the random variable W - the sum of X 1 and X 2 - by W = X 1 + X 2. An- other random variable, let us call it X, is a Bernoulli random variable defined by
X =
With such a set-up, X signals whether the event of interest has hap- pened, and we can state our original problem in terms of X, namely “Compute P [X = 1 ] !”.
This example is, admittedly, a little contrived. The point, however, is that anything can be phrased in terms of random variables; thus, if you know how to work with random variables, i.e., know how to compute their distributions, you can solve any problem in probability that comes your way. Another reason Bernoulli random variables are useful is the fact that we can do arithmetic with them.
Example 1. 3. 2. 70 coins are tossed and their outcomes are denoted by W 1 , W 2 ,... , W 70. All Wi are random variables with values in {H, T} (and therefore not Bernoulli random variables), but they can be easily recoded into Bernoulli random variables as follows:
Xi =
1, if Wi = H, 0, if Wi = T.
Once you have the “dictionary” { 1 ↔ H, 0 ↔ T}, random variables Xi and Wi carry exactly the same information. The advantage of using Xi is that the random variable
70
i= 1
Xi,
which takes values in SN = {0, 1, 2,... , 70} counts the number of heads among W 1 ,... , W 70. Similarly, the random variable
M = X 1 × X 2 × · · · × X 70
is a Bernoulli random variable itself. What event does it indicate?
The distribution of a random variable is sometimes defined as “the collection of all possible probabilities associated to it”. This sounds a bit abstract, and, at least in the discrete case, obscures the practical significance of this impor- tant concept. We have learned that for discrete variables the knowledge of the pmf or the distribution table (such as the one in part 1 ., 2. or 3. of Example
Example 1. 4. 1.
Recall that the binomial distribution arises as the “number of suc- cesses in n independent Bernoulli trials”, i.e., it counts the number of H in n independent tosses of a biased coin whose probability of H is p.
0 1 2 3... p qp q^2 p q^3 p...
● (^) ● ● (^) ● (^) ● (^) ● ● (^) ● ● 0 1 2 y
Figure 2. The probability mass function (pmf) of a typical geometric distribution.
Caveat: When defining the geometric distribution, some people count the number of trials to the first success, i.e., add the final success into the count. This shifts everything by 1 and leads to a distribution with support N (and not N 0 ). While this is no big deal, this ambiguity tends to be confusing at times and leads to bugs in software. For us, the geometric distribution will always start from 0. The distri- bution which counts the final success will be referred to as the shifted geometric distribution , but we’ll try to avoid it altogether.
0 1 2 3 4... e− λ^ e− λλ e− λ λ 2 2 e
− λ λ^3 3! e
− λ λ^4 4!...
The closed form for the pmf is
pX (x) = e− λ λ x x! ,^ x^ ∈^ N. The Poisson distribution arises as a limit when n → ∞ and p → 0 while np ∼ λ in the Binomial distribution.
●
●
●
● ● ●
● ● ● ● (^) ● (^) ● ● 0 1 2 ●^ ●^ ●^ ●^ ●^ ●^ ●^ ● y
Figure 3. The probability mass function (pmf) of a typical Poisson distribution with λ > 1.
The fundamental properties of the variance/standard deviation are given in the following theorem^3 :
Theorem 1. 5. 4. Suppose that X and Y are random variables and that α is a constant. Then
1. Var[ α X] = α^2 Var[X], and
2. if, additionally, X and X are independent, then
Var[X + Y] = Var[X] + Var[Y].
Caveat: These properties are not the same as the properties of the ex- pectation. First of all the constant comes out of the variance with a square, and second, the variance of the sum is the sum of the indi- vidual variances only if additional assumptions, such as the indepen- dence between the two variables, are imposed.
Finally, here is a very useful alternative formula for the variance of a random variable:
Proposition 1. 5. 5. Var[X] = E [X^2 ] − ( E [X])^2.
Let us compute expectations and variances/standard deviations for our most important examples.
Example 1. 5. 6.
E [X] = 0 × q + 1 × p = p.
Using ( 1. 5. 5 ), we get
Var[X] = E [X^2 ] − ( E [X])^2 = 02 × q + 12 × p − p^2 = p − p^2 = p( 1 − p) = pq,
and, so, sd[X] =
pq.
(^3) we will talk about independence in detail in the next lecture. An intuitive understanding should be fine for now.
n
k= 0
k
n k
pkqn−k,
or use some of the properties of the expectation of Theorem 1. 5. 2. To do the latter, we remember that the distribution of a binomial is the same as the distribution of a sum of n (independent) Bernoul- lies. So if we write X = X 1 + · · · + Xn, and each X 1... Xn has the B(p) distribution, Theorem 1. 5. 2 yields
E [X] = E [X 1 ] + E [X 2 ] + · · · + E [Xn] = np. ( 1. 5. 3 )
A similar simplification can be achieved in the computation of the variance, too. While it was unimportant in that X 1 ,... , Xn are in- dependent in ( 1. 5. 3 ), it is crucial for Theorem 1. 5. 4 :
Var[X] = Var[X 1 ] + · · · + Var[Xn] = npq,
and, so, sd[X] =
npq.
E [X] =
∞
k= 0
kpqk.
Instead of doing that, let us proceed somewhat informally and note that we can think of a geometric random variable as follows:
With probability p our first throw is a success and X =
Therefore, E [X] = p × 0 + q × ( 1 + E [X]), and, so, E [X] = q/p. Similar reasoning can be applied to obtain
E [X^2 ] = p × 0 + q E [( 1 + X)^2 ] = q + 2 q E [X] + q E [X^2 ] = q + 2 q^2 /p + q E [X^2 ],
which yields Var[X] = E [X^2 ] − ( E [X])^2 = q/p^2 and sd[X] = √q /p.
Y − 5, where Y ∼ g(p) (geometric),
2Y, where Y ∼ P( λ ) (Poisson).
Problem 1. 6. 4. Let Y denote the number of tosses of a fair die until the first 6 is obtained (if we get a 6 on the first try, Y = 0). The support SY of Y is
(a) {0, 1, 2, 3, 4,... }
(b) {1, 2, 3, 4, 5, 6}
(c) { 16 , 16 , 16 , 16 , 16 }
(d) { 16 , 56 × 16 ,
6
6
(e) none of the above
Problem 1. 6. 5. The probability that Janet makes a free throw is 0.6. What is the probability that she will make at least 16 out of 23 (independent) throws? Write down the answer as a sum - no need to evaluate it.
Problem 1. 6. 6. Three unbiased and independent coins are tossed. Let Y 1 be the total number of heads on the first two coins, and let Y be the random variable which is equal to Y 1 if the third coin comes up heads and −Y 1 if it comes up tails. Compute Var[Y].
Problem 1. 6. 7. A die is thrown and a coin is tossed independently of it. Let Y be the random variable which is equal to the number on the die in case the coin comes up heads and twice the number on the die if it comes up tails.
What the support of SY of Y? What is its distribution (pmf)?
Compute E [Y] and Var[Y].
Problem 1. 6. 8. n people vote in a general election, with only two candidates running. The vote of person i is denoted by Yi and it can take values 0 and 1, depending which candidate they voted for (we encode one of them as 0 and the other as 1). We assume that votes are independent of each other and that each person votes for candidate 1 with probability p. If the total number of votes for candidate 1 is denoted by Y, then
(a) Y is a geometric random variable
(b) Y^2 is a binomial random variable
(c) Y is uniform on {0, 1,... , n}
(d) Var[Y] ≤ E [Y]
(e) none of the above
Problem 1. 6. 9. A discrete random variable Y is said to have a discrete uni- form distribution on {0, 1, 2,... , n}, denoted by Y ∼ u(n) if its distribution table looks like this:
0 1 2... n 1 n+ 1
1 n+ 1
1 n+ 1...^
1 n+ 1
Compute the expectation and the variance of u(n). You may use the fol- lowing identities: 1 + 2 + · · · + n = 12 n(n + 1 ) and 1^2 + 22 + · · · + n^2 = 1 6 n(n^ +^1 )(^2 n^ +^1 ).
Problem 1. 6. 10. Let X be a Poisson random variable with parameter λ > 0. Compute the following:
P [X ≥ 3 ],
(*) E [X^3 ]. Note: The sum you need to evaluate is quite difficult. If you don’t know the trick, do not worry. If you know how to use symbolic-computation software such as Mathematica, feel free to use it. We will learn how to do this using generating functions later in the class.
Problem 1. 6. 11. Let X be a geometric random variable with parameter p ∈ (0, 1), i.e. X ∼ g(p), and let Y = 2 −X. Write down the (first few entries in) the distribution table of Y. Compute E [Y] = E [ 2 −X^ ].
Problem 1. 6. 12. Let Y 1 and Y 2 be uncorrelated discrete random variables such that Var[ 2 Y 1 − Y 2 ] = 17 and Var[Y 1 + 2 Y 2 ] = 5. Compute Var[Y 1 − Y 2 ]. Note: Y 1 and Y 2 are uncorrelated if E [(Y 1 − E [Y 1 ])(Y 2 − E [Y 2 ])] = 0. (Hint: What is Var[ α Y 2 + β Y 2 ] in terms of Var[Y 1 ] and Var[Y 2 ] when Y 1 and Y 2 are uncorrelated?)
Problem 1. 6. 13. Let Y 1 and Y 2 be uncorrelated random variables such that sd[Y 1 + Y 2 ] = 5. Then sd[Y 1 − Y 2 ] =
(a) 1 (b)
2 (c)
3 (d) 5 (e) not enough information is given