




























































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Random variables, Discrete random variables, Calculus are main topics
Typology: Lecture notes
1 / 107
This page cannot be seen from the preview
Don't miss anything!





























































































(with 33 illustrations)
The probable is what usually happens. — Aristotle It is a truth very certain that when it is not in our power to determine. what is true we ought to follow what is most probable — Descartes - “Discourse on Method” It is remarkable that a science which began with the consideration of games of chance should have become the most important object of human knowledge. — Pierre Simon Laplace - “Théorie Analytique des Probabilités, 1812 ” Anyone who considers arithmetic methods of producing random digits is, of course, in a state of sin. — John von Neumann - quote in “Conic Sections” by D. MacHale I say unto you: a man must have chaos yet within him to be able to give birth to a dancing star: I say unto you: ye have chaos yet within you... — Friedrich Nietzsche - “Thus Spake Zarathustra”
1.1 Random variables
Probability is about random variables. Instead of giving a precise definition, let us just metion that a random variable can be thought of as an uncertain, numerical (i.e., with values in R) quantity. While it is true that we do not know with certainty what value a random variable X will take, we usually know how to compute the probability that its value will be in some some subset of R. For example, we might be interested in P[X ≥ 7], P[X ∈ [2, 3 .1]] or P[X ∈ { 1 , 2 , 3 }]. The collection of all such probabilities is called the distribution of X. One has to be very careful not to confuse the random variable itself and its distribution. This point is particularly important when several random variables appear at the same time. When two random variables X and Y have the same distribution, i.e., when P[X ∈ A] = P[Y ∈ A] for any set A, we say that X and Y are equally
distributed and write X (d) = Y.
Discrete random variables are very nice due to the following fact: in order to be able to compute any conceivable probability involving a discrete random variable X, it is enough to know how to compute the probabilities P[X = x], for all x ∈ S. Indeed, if we are interested in figuring out how much P[X ∈ B] is, for some set B ⊆ R (B = [3, 6], or B = [− 2 , ∞)), we simply pick all x ∈ S which are also in B and sum their probabilities. In mathematical notation, we have
P[X ∈ B] =
x∈S∩B
P[X = x].
For this reason, the distribution of any discrete random variable X is usually described via a table
X ∼
x 1 x 2 x 3... p 1 p 2 p 3...
where the top row lists all the elements of S (the support of X) and the bottom row lists their probabilities (pi = P[X = xi], i ∈ N). When the random variable is N-valued (or N 0 -valued), the situation is even simpler because we know what x 1 , x 2 ,... are and we identify the distribution of X with the sequence p 1 , p 2 ,... (or p 0 , p 1 , p 2 ,... in the N 0 -valued case), which we call the probability mass function (pmf) of the random variable X. What about the extended N 0 -valued case? It is as simple because we can compute the probability P[X = +∞], if we know all the probabilities pi = P[X = i], i ∈ N 0. Indeed, we use the fact that
P[X = 0] + P[X = 1] + · · · + P[X = ∞] = 1,
so that P[X = ∞] = 1 −
i=1 pi, where^ pi^ =^ P[X^ =^ i].^ In other words, if you are given a probability mass function (p 0 , p 1 ,... ), you simply need to compute the sum
i=1 pi. If it happens to be equal to 1 , you can safely conclude that X never takes the value +∞. Otherwise, the probability of +∞ is positive. The random variables for which S = { 0 , 1 } are especially useful. They are called indicators. The name comes from the fact that you should think of such variables as signal lights; if X = 1 an event of interest has happened, and if X = 0 it has not happened. In other words, X indicates the occurence of an event. The notation we use is quite suggestive; for example, if Y is the outcome of a coin-toss, and we want to know whether Heads (H) occurred, we write
X = (^1) {Y =H}.
Example 1.2. Suppose that two dice are thrown so that Y 1 and Y 2 are the numbers obtained (both Y 1 and Y 2 are discrete random variables with S = { 1 , 2 , 3 , 4 , 5 , 6 }). If we are interested in the probability the their sum is at least 9 , we proceed as follows. We define the random variable Z - the sum of Y 1 and Y 2 - by Z = Y 1 + Y 2. Another random variable, let us call it X, is defined by X = (^1) {Z≥ 9 }, i.e.,
X =
With such a set-up, X signals whether the event of interest has happened, and we can state our original problem in terms of X : “Compute P[X = 1] !”. Can you compute it?
1.4 Expectation
For a discrete random variable X with support , we define the expectation E[X] of X by
E[X] =
x∈
xP[X = x],
as long as the (possibly) infinite sum
x∈ xP[X^ =^ x]^ absolutely converges. When the sum does not converge, or if it converges only conditionally, we say that the expectation of X is not defined. When the random variable in question is N 0 -valued, the expression above simplifies to
i=
i × pi,
where pi = P[X = i], for i ∈ N 0. Unlike in the general case, the absolute convergence of the defining series can fail in essentially one way, i.e., when
lim n→∞
∑^ n
i=
ipi = +∞.
In that case, the expectation does not formally exist. We still write E[X] = +∞, but really mean that the defining sum diverges towards infinity. Once we know what the expectation is, we can easily define several more common terms:
Definition 1.3. Let X be a discrete random variable.
- If the expectation E[X] exists, we say that X is integrable. - If E[X^2 ] < ∞ (i.e., if X^2 is integrable), X is called square-integrable. - If E[|X|m] < ∞, for some m > 0 , we say that X has a finite m-th moment. - If X has a finite m-th moment, the expectation E[|X − E[X]|m] exists and we call it the m-th central moment.
It can be shown that the expectation E possesses the following properties, where X and Y are both assumed to be integrable:
1.6 Dependence and independence
One of the main differences between random variables and (deterministic or non-random) quan- tities is that in the former case the whole is more than the sum of its parts. What do I mean by that? When two random variables, say X and Y , are considered in the same setting, you must specify more than just their distributions, if you want to compute probabilities that involve both of them. Here are two examples.
In both cases, both X and Y have the same distribution
1 6
1 6
1 6
1 6
1 6
1 6
The pairs (X, Y ) are, however, very different in the two examples. In the first one, if the value of X is revealed, it will not affect our view of the value of Y. Indeed, the dice are not “connected” in any way (they are independent in the language of probability). In the second case, the knowledge of X allows us to say what Y is without any doubt - it is 6 − X. This example shows that when more than one random variable is considered, one needs to obtain external information about their relationship - not everything can be deduced only by looking at their distributions (pmfs, or... ). One of the most common forms of relationship two random variables can have is the one of example (1) above, i.e., no relationship at all. More formally, we say that two (discrete) random variables X and Y are independent if
P[X = x and Y = y] = P[X = x]P[Y = y],
for all x and y in the respective supports (^) X and (^) Y of X and Y. The same concept can be applied to events, and we say that two events A and B are independent if
P[A ∩ B] = P[A]P[B].
The notion of independence is central to probability theory (and this course) because it is relatively easy to spot in real life. If there is no physical mechanism that ties two events (like the two dice we throw), we are inclined to declare them independent^2. One of the most important tasks in probabilistic modelling is the identification of the (small number of) independent random variables which serve as building blocks for a big complex system. You will see many examples of that as we proceed through the course.
(^2) Actually, true independence does not exist in reality, save, perhaps a few quantum-theoretic phenomena. Even with apparently independent random variables, dependence can sneak in the most sly of ways. Here is a funny example: a recent survey has found a large correlation between the sale of diapers and the sale of six-packs of beer across many Walmart stores throughout the country. At first these two appear independent, but I am sure you can come up with many an amusing story why they should, actually, be quite dependent.
1.7 Conditional probability
When two random variables are not independent, we still want to know how the knowledge of the exact value of one of the affects our guesses about the value of the other. That is what the conditional probability is for. We start with the definition, and we state it for events first: for two events A, B such that P[B] > 0 , the conditional probability P[A|B] of A given B is defined as:
The conditional probability is not defined when P[B] = 0 (otherwise, we would be computing 0 0 - why?). Every statement in the sequel which involves conditional probability will be assumed to hold only when P[B] = 0, without explicit mention. The conditional probability calculations often use one of the following two formulas. Both of them use the familiar concept of partition. If you forgot what it is, here is a definition: a collection A 1 , A 2 ,... , An of events is called a partition of Ω if a) A 1 ∪ A 2 ∪... An = Ω and b) Ai ∩ Aj = ∅ for all pairs i, j = 1,... , n with i 6 = j. So, let A 1 ,... , An be a partition of Ω, and let B be an event.
∑^ n
i=
P[B|Ai]P[Ai].
P[Ak|B] = P[B|Ak]P[Ak] ∑n i=1 P[B|Ai]P[Ai]^
Even though the formulas above are stated for finite partitions, they remain true when the number of Ak’s is countably infinite. The finite sums have to be replaced by infinite series, however. Random variables can be substituted for events in the definition of conditional probability as follows: for two random variables X and Y , the conditional probabilty that X = x, given Y = y (with x and y in respective supports (^) X and (^) Y ) is given by
P[X = x|Y = y] = P[X = x and Y = y] P[Y = y]
The formula above produces a different probability distribution for each y. This is called the conditional distribution of X, given Y = y. We give a simple example to illustrate this concept. Let X be the number of heads obtained when two coins are thrown, and let Y be the indicator of the event that the second coin shows heads. The distribution of X is Binomial:
1 4
1 2
1 4
or, in the more compact notation which we use when the support is clear from the context X ∼ ( 14 , 12 , 14 ). The random variable Y has the Bernoulli distribution Y = ( 12 , 12 ). What happens
When several random variables (X 1 , X 2 ,... Xn) are considered in the same setting, we of- ten group them together into a random vector. The distribution of the random vector X = (X 1 ,... , Xn) is the collection of all probabilities of the form
P[X 1 = x 1 , X 2 = x 2 ,... , Xn = xn],
when x 1 , x 2 ,... , xn range through all numbers in the appropriate supports. Unlike in the case of a single random variable, writing down the distributions of random vectors in tables is a bit more difficult. In the two-dimensional case, one would need an entire matrix, and in the higher dimensions some sort of a hologram would be the only hope. The distributions of the components X 1 ,... , Xn of the random vector X are called the marginal distributions of the random variables X 1 ,... , Xn. When we want to stress the fact that the random variables X 1 ,... , Xn are a part of the same random vector, we call the distribution of X the joint distribution of X 1 ,... , Xn. It is important to note that, unless random variables X 1 ,... , Xn are a priori known to be independent, the joint distribution holds more information about X than all marginal distributions together.
1.8 Examples
Here is a short list of some of the most important discrete random variables. You will learn about generating functions soon.
Example 1.9.
Bernoulli. Success (1) of failure (0) with probability p (if success is encoded by 1 , failure by − 1 and p = 12 , we call it the coin toss).
0.0-0.5 0.0 0.5 1.
0.7 .parameters : p ∈ (0, 1) (q = 1 − p)
.notation : b(p) .support : { 0 , 1 } .pmf : p 0 = p and p 1 = q = 1 − p .generating function : ps + q .mean : p .standard deviation : √pq .figure : the mass function a Bernoulli distribu- tion with p = 1/ 3. Binomial. The number of successes in n repeti-
tions of a Bernoulli trial with success probability p_._
0 10 20 30 40 50
0.30 .parameters : n ∈ N, p ∈ (0, 1) (q = 1 − p)
.notation : b(n, p) .support : { 0 , 1 ,... , n} .pmf : pk =
(n k
pkqn−k, k = 0,... , n .generating function : (ps + q)n .mean : np .standard deviation :
npq .figure : mass functions of three binomial dis- tributions with n = 50 and p = 0. 05 (blue), p = 0. 5 (purple) and p = 0. 8 (yellow).
Poisson. The number of spelling mistakes one makes while typing a single page.
0 5 10 15 20 25
.parameters : λ > 0 .notation : p(n, p) .support : N 0 .pmf : pk = e−λ λ k k! ,^ k^ ∈^ N^0 .generating function : eλ(s−1) .mean : λ .standard deviation :
λ .figure : mass functions of two Poisson distribu- tions with parameters λ = 0. 9 (blue) and λ = 10 (purple).
Geometric. The number of repetitions of a Bernoulli trial with parameter p until the first success.
0 5 10 15 20 25 30
0.30 .parameters : p ∈ (0, 1), q = 1 − p
.notation : g(p) .support : N 0 .pmf : pk = pqk−^1 , k ∈ N 0 .generating function : (^1) −pqs .mean : qp .standard deviation :
√q p .figure : mass functions of two Geometric distri- butions with parameters p = 0. 1 (blue) and p = 0. 4 (purple).
Mathematica is a glorified calculator. Here is how to use it^1.
2.1 Basic Syntax
- Symbols +, -, /, ^, * are all supported by Mathematica. Multiplication can be repre- sented by a space between variables. a x + b and ax + b are identical. - Warning: Mathematica is case-sensitive. For example, the command to exit is Quit and not quit or QUIT. - Brackets are used around function arguments. Write Sin[x], not Sin(x) or Sin{x}. - Parentheses ( ) group terms for math operations: (Sin[x]+Cos[y])(Tan[z]+z^2). - If you end an expression with a ; (semi-colon) it will be executed, but its output will not be shown. This is useful for simulations, e.g. - Braces { } are used for lists:
In[1]:= A = 8 1, 2, 3< Out[1]= 8 1, 2, 3<
- Names can refer to variables, expressions, functions, matrices, graphs, etc. A name is assigned using name = object. An expression may contain undefined names:
In[5]:= A = Ha + bL ^ 3 Out[5]= Ha + bL^3 In[6]:= A ^ 2 Out[6]= Ha^ +^ bL^6 (^1) Actually, this is just a tip of the iceberg. It can do many many many other things.
- The percent sign % stores the value of the previous result
In[7]:= 5 + 3 Out[7]= 8 In[8]:= % ^ 2 Out[8]= 64
2.2 Numerical Approximation
- N[expr] gives the approximate numerical value of expression, variable, or command:
In[9]:= N@Sqrt@^2 DD Out[9]= 1.
- N[%] gives the numerical value of the previous result:
In[17]:= E + Pi Out[17]= ã + Π
In[18]:= N@%D Out[18]= 5.
- N[expr,n] gives n digits of precision for the expression expr:
In[14]:= N@Pi, 30D Out[14]= 3.
- Expressions whose result can’t be represented exactly don’t give a value unless you request approximation:
In[11]:= Sin@ 3 D Out[11]= Sin@^3 D In[12]:= N@Sin@^3 DD Out[12]= 0.
2.3 Expression Manipulation
- Expand[expr] (algebraically) expands the expression expr:
- If the expression expr depends on a variable (say i), Table[expr,{i,m,n}] produces a list of the values of the expression expr as i ranges from m to n
In[37]:= Table@i ^ 2,^8 i, 0, 5<D Out[37]= 8 0, 1, 4, 9, 16, 25<
- The same works with two indices - you will get a list of lists
In[40]:= Table@i ^ j, 8 i, 1, 3<, 8 j, 2, 3<D Out[40]= 88 1, 1<,^8 4, 8<,^8 9, 27<<
- It is possible to define your own functions in Mathematica. Just use the underscore syntax f[x_]=expr, where expr is some expression involving x:
In[47]:= f@x_D^ =^ x ^ 2 Out[47]= x^2 In[48]:= f@x^ +^ yD Out[48]= Hx^ +^ yL^2
- To apply the function f (either built-in, like Sin, or defined by you) to each element of the list L, you can use the command Map with syntax Map[f,L]:
In[50]:= f@x_D^ =^3 *****^ x Out[50]= 3 x In[51]:= L = 8 1, 2, 3, 4< Out[51]= 8 1, 2, 3, 4< In[52]:= Map@f, LD Out[52]= 8 3, 6, 9, 12<
- If you want to add all the elements of a list L, use Total[L]. The list of the same length as L, but whose kth^ element is given by the sum of the first k elements of L is given by Accumulate[L]:
In[8]:= L = 8 1, 2, 3, 4, 5< Out[8]= 8 1, 2, 3, 4, 5< In[9]:= Accumulate@LD Out[9]= 8 1, 3, 6, 10, 15< In[10]:= Total@LD Out[10]= 15
2.5 Linear Algebra
- In Mathematica , matrix is a nested list, i.e., a list whose elements are lists. By convention, matrices are represented row by row (inner lists are row vectors). - To access the element in the ith^ row and jth^ column of the matrix A, type A[[i,j]] or A[[i]][[j]]:
In[59]:= A = 88 2, 1, 3<, 8 5, 6, 9<< Out[59]= 88 2, 1, 3<,^8 5, 6, 9<< In[60]:= A@@2, 3DD Out[60]= 9 In[61]:= A@@^2 DD@@^3 DD Out[61]= 9
- Matrixform[expr] displays expr as a matrix (provided it is a nested list)
In[9]:= A^ =^ Table@i^ *****^ 2 ^ j,^8 i, 2, 5<,^8 j, 1, 2<D Out[9]= 88 4, 8<, 8 6, 12<, 8 8, 16<, 8 10, 20<<
In[10]:= MatrixForm@AD Out[10]//MatrixForm= i
k
jjj jjjj jjj jjjj
4 8 6 12 8 16 10 20
y
{
zzz zzzz zzz zzzz
- Commands Transpose[A], Inverse[A], Det[A], Tr[A] and MatrixRank[A] return the trans- pose, inverse, determinant, trace and rank of the matrix A, respectively. - To compute the nth^ power of the matrix A, use MatrixPower[A,n]
In[21]:= A^ =^88 1, 1<,^8 1, 0<< Out[21]= 88 1, 1<, 8 1, 0<<
In[22]:= MatrixForm@MatrixPower@A, 5DD Out[22]//MatrixForm= i k
jjj 8 5 5 3
y {
zzz
- Identity matrix of order n is produced by IdentityMatrix[n]. - If A and B are matrices of the same order, A+B and A-B are their sum and difference.