Probability and Random Variables: A Comprehensive Guide with Examples, Study notes of Probability and Statistics

Notes on probability by Peter J. Cameron. It covers basic ideas, conditional probability, and random variables. The notes are available on the website of Queen Mary University of London, along with other course materials, past exam papers, and solutions. The document also provides links to other web resources related to probability, such as a free textbook on Introduction to Probability, virtual laboratories in probability and statistics, and an article on Venn diagrams. a useful resource for students studying probability at the university level.

Typology: Study notes

2021/2022

Uploaded on 05/11/2023

chiara44
chiara44 🇺🇸

4.7

(11)

245 documents

1 / 94

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Notes on Probability
Peter J. Cameron
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51
pf52
pf53
pf54
pf55
pf56
pf57
pf58
pf59
pf5a
pf5b
pf5c
pf5d
pf5e

Partial preview of the text

Download Probability and Random Variables: A Comprehensive Guide with Examples and more Study notes Probability and Statistics in PDF only on Docsity!

Notes on Probability

Peter J. Cameron

ii

iv

  1. Covariance, correlation. Means and variances of linear functions of random variables.
  2. Limiting distributions in the Binomial case.

These course notes explain the naterial in the syllabus. They have been “field- tested” on the class of 2000. Many of the examples are taken from the course homework sheets or past exam papers.

Set books The notes cover only material in the Probability I course. The text- books listed below will be useful for other courses on probability and statistics. You need at most one of the three textbooks listed below, but you will need the statistical tables.

  • Probability and Statistics for Engineering and the Sciences by Jay L. De- vore (fifth edition), published by Wadsworth.

Chapters 2–5 of this book are very close to the material in the notes, both in order and notation. However, the lectures go into more detail at several points, especially proofs. If you find the course difficult then you are advised to buy this book, read the corresponding sections straight after the lectures, and do extra exercises from it. Other books which you can use instead are:

  • Probability and Statistics in Engineering and Management Science by W. W. Hines and D. C. Montgomery, published by Wiley, Chapters 2–8.
  • Mathematical Statistics and Data Analysis by John A. Rice, published by Wadsworth, Chapters 1–4.

You should also buy a copy of

  • New Cambridge Statistical Tables by D. V. Lindley and W. F. Scott, pub- lished by Cambridge University Press.

You need to become familiar with the tables in this book, which will be provided for you in examinations. All of these books will also be useful to you in the courses Statistics I and Statistical Inference. The next book is not compulsory but introduces the ideas in a friendly way:

  • Taking Chances: Winning with Probability, by John Haigh, published by Oxford University Press.

v

Web resources Course material for the MAS108 course is kept on the Web at the address

http://www.maths.qmw.ac.uk/˜pjc/MAS108/

This includes a preliminary version of these notes, together with coursework sheets, test and past exam papers, and some solutions. Other web pages of interest include

http://www.dartmouth.edu/˜chance/teaching aids/ books articles/probability book/pdf.html

A textbook Introduction to Probability, by Charles M. Grinstead and J. Laurie Snell, available free, with many exercises.

http://www.math.uah.edu/stat/

The Virtual Laboratories in Probability and Statistics, a set of web-based resources for students and teachers of probability and statistics, where you can run simula- tions etc.

http://www.newton.cam.ac.uk/wmy2kposters/july/

The Birthday Paradox (poster in the London Underground, July 2000).

http://www.combinatorics.org/Surveys/ds5/VennEJC.html

An article on Venn diagrams by Frank Ruskey, with history and many nice pic- tures. Web pages for other Queen Mary maths courses can be found from the on-line version of the Maths Undergraduate Handbook.

Peter J. Cameron December 2000

Contents

  • 1 Basic ideas
    • 1.1 Sample space, events
    • 1.2 What is probability?
    • 1.3 Kolmogorov’s Axioms
    • 1.4 Proving things from the axioms
    • 1.5 Inclusion-Exclusion Principle
    • 1.6 Other results about sets
    • 1.7 Sampling
    • 1.8 Stopping rules
    • 1.9 Questionnaire results
    • 1.10 Independence
    • 1.11 Mutual independence
    • 1.12 Properties of independence
    • 1.13 Worked examples
  • 2 Conditional probability
    • 2.1 What is conditional probability?
    • 2.2 Genetics
    • 2.3 The Theorem of Total Probability
    • 2.4 Sampling revisited
    • 2.5 Bayes’ Theorem
    • 2.6 Iterated conditional probability
    • 2.7 Worked examples
  • 3 Random variables
    • 3.1 What are random variables?
    • 3.2 Probability mass function
    • 3.3 Expected value and variance
    • 3.4 Joint p.m.f. of two random variables
    • 3.5 Some discrete random variables
    • 3.6 Continuous random variables
    • 3.7 Median, quartiles, percentiles viii CONTENTS
    • 3.8 Some continuous random variables
    • 3.9 On using tables
    • 3.10 Worked examples
  • 4 More on joint distribution
    • 4.1 Covariance and correlation
    • 4.2 Conditional random variables
    • 4.3 Joint distribution of continuous r.v.s
    • 4.4 Transformation of random variables
    • 4.5 Worked examples
  • A Mathematical notation
  • B Probability and random variables
2 CHAPTER 1. BASIC IDEAS

On this point, Albert Einstein wrote, in his 1905 paper On a heuristic point of view concerning the production and transformation of light (for which he was awarded the Nobel Prize),

In calculating entropy by molecular-theoretic methods, the word “prob- ability” is often used in a sense differing from the way the word is defined in probability theory. In particular, “cases of equal probabil- ity” are often hypothetically stipulated when the theoretical methods employed are definite enough to permit a deduction rather than a stip- ulation.

In other words: Don’t just assume that all outcomes are equally likely, especially when you are given enough information to calculate their probabilities!

An event is a subset of S. We can specify an event by listing all the outcomes

that make it up. In the above example, let A be the event ‘more heads than tails’ and B the event ‘heads on last throw’. Then

A = {HHH, HHT, HT H, T HH}, B = {HHH, HT H, T HH, T T H}.

The probability of an event is calculated by adding up the probabilities of all the outcomes comprising that event. So, if all outcomes are equally likely, we have

P(A) =

|A|

|S|

In our example, both A and B have probability 4/ 8 = 1 /2. An event is simple if it consists of just a single outcome, and is compound otherwise. In the example, A and B are compound events, while the event ‘heads on every throw’ is simple (as a set, it is {HHH}). If A = {a} is a simple event, then the probability of A is just the probability of the outcome a, and we usually write P(a), which is simpler to write than P({a}). (Note that a is an outcome, while {a} is an event, indeed a simple event.) We can build new events from old ones:

  • A ∪ B (read ‘A union B’) consists of all the outcomes in A or in B (or both!)
  • A ∩ B (read ‘A intersection B’) consists of all the outcomes in both A and B;
  • A \ B (read ‘A minus B’) consists of all the outcomes in A but not in B;
  • A′^ (read ‘A complement’) consists of all outcomes not in A (that is, S \ A);
  • 0 / (read ‘empty set’) for the event which doesn’t contain any outcomes.
1.2. WHAT IS PROBABILITY? 3

Note the backward-sloping slash; this is not the same as either a vertical slash | or a forward slash /.

In the example, A′^ is the event ‘more tails than heads’, and A ∩ B is the event {HHH, T HH, HT H}. Note that P(A ∩ B) = 3 /8; this is not equal to P(A) · P(B), despite what you read in some books!

1.2 What is probability?

There is really no answer to this question.

Some people think of it as ‘limiting frequency’. That is, to say that the proba- bility of getting heads when a coin is tossed means that, if the coin is tossed many times, it is likely to come down heads about half the time. But if you toss a coin 1000 times, you are not likely to get exactly 500 heads. You wouldn’t be surprised to get only 495. But what about 450, or 100? Some people would say that you can work out probability by physical argu- ments, like the one we used for a fair coin. But this argument doesn’t work in all cases, and it doesn’t explain what probability means. Some people say it is subjective. You say that the probability of heads in a coin toss is 1/2 because you have no reason for thinking either heads or tails more likely; you might change your view if you knew that the owner of the coin was a magician or a con man. But we can’t build a theory on something subjective. We regard probability as a mathematical construction satisfying some axioms (devised by the Russian mathematician A. N. Kolmogorov). We develop ways of doing calculations with probability, so that (for example) we can calculate how unlikely it is to get 480 or fewer heads in 1000 tosses of a fair coin. The answer agrees well with experiment.

1.3 Kolmogorov’s Axioms

Remember that an event is a subset of the sample space S. A number of events,

say A 1 , A 2 ,.. ., are called mutually disjoint or pairwise disjoint if Ai ∩ A (^) j = 0 / for any two of the events Ai and A (^) j; that is, no two of the events overlap. According to Kolmogorov’s axioms, each event A has a probability P(A), which is a number. These numbers satisfy three axioms:

Axiom 1: For any event A, we have P(A) ≥ 0.

Axiom 2: P(S) = 1.

1.4. PROVING THINGS FROM THE AXIOMS 5

(each contains only one element which is in none of the others), and A 1 ∪ A 2 ∪ · · · ∪ An = A; so by Axiom 3a, we have

P(A) = P(a 1 ) + P(a 2 ) + · · · + P(an).

Corollary 1.2 If the sample space S is finite, say S = {a 1 ,... , an}, then

P(a 1 ) + P(a 2 ) + · · · + P(an) = 1.

For P(a 1 ) + P(a 2 ) + · · · + P(an) = P(S) by Proposition 1.1, and P(S) = 1 by

Axiom 2. Notice that once we have proved something, we can use it on the same basis as an axiom to prove further facts.

Now we see that, if all the n outcomes are equally likely, and their probabil-

ities sum to 1, then each has probability 1/n, that is, 1/|S|. Now going back to

Proposition 1.1, we see that, if all outcomes are equally likely, then

P(A) =
|A|

|S|

for any event A, justifying the principle we used earlier.

Proposition 1.3 P(A′) = 1 − P(A) for any event A.

Let A 1 = A and A 2 = A′^ (the complement of A). Then A 1 ∩ A 2 = 0 / (that is, the

events A 1 and A 2 are disjoint), and A 1 ∪ A 2 = S. So

P(A 1 ) + P(A 2 ) = P(A 1 ∪ A 2 ) (Axiom 3)

= P(S)

= 1 (Axiom 2).

So P(A) = P(A 1 ) = 1 − P(A 2 ).

Corollary 1.4 P(A) ≤ 1 for any event A.

For 1 − P(A) = P(A′) by Proposition 1.3, and P(A′) ≥ 0 by Axiom 1; so 1 − P(A) ≥ 0, from which we get P(A) ≤ 1.

Remember that if you ever calculate a probability to be less than 0 or more than 1, you have made a mistake!

Corollary 1.5 P( 0 /) = 0.

For 0 / = S′, so P( 0 /) = 1 − P(S) by Proposition 1.3; and P(S) = 1 by Axiom 2,

so P( 0 /) = 0.

6 CHAPTER 1. BASIC IDEAS

Here is another result. The notation A ⊆ B means that A is contained in B, that is, every outcome in A also belongs to B.

Proposition 1.6 If A ⊆ B, then P(A) ≤ P(B).

This time, take A 1 = A, A 2 = B \ A. Again we have A 1 ∩ A 2 = 0 / (since the elements of B \ A are, by definition, not in A), and A 1 ∪ A 2 = B. So by Axiom 3,

P(A 1 ) + P(A 2 ) = P(A 1 ∪ A 2 ) = P(B).

In other words, P(A) + P(B \ A) = P(B). Now P(B \ A) ≥ 0 by Axiom 1; so

P(A) ≤ P(B),

as we had to show.

1.5 Inclusion-Exclusion Principle

















A B

A Venn diagram for two sets A and B suggests that, to find the size of A ∪ B, we add the size of A and the size of B, but then we have included the size of A ∩ B twice, so we have to take it off. In terms of probability:

Proposition 1.

P(A ∪ B) = P(A) + P(B) − P(A ∩ B).

We now prove this from the axioms, using the Venn diagram as a guide. We see that A ∪ B is made up of three parts, namely

A 1 = A ∩ B, A 2 = A \ B, A 3 = B \ A.

Indeed we do have A ∪ B = A 1 ∪ A 2 ∪ A 3 , since anything in A ∪ B is in both these sets or just the first or just the second. Similarly we have A 1 ∪A 2 = A and A 1 ∪A 3 = B. The sets A 1 , A 2 , A 3 are mutually disjoint. (We have three pairs of sets to check. Now A 1 ∩ A 2 = 0 /, since all elements of A 1 belong to B but no elements of A 2 do. The arguments for the other two pairs are similar – you should do them yourself.)

8 CHAPTER 1. BASIC IDEAS

1.7 Sampling

I have four pens in my desk drawer; they are red, green, blue, and purple. I draw a

pen; each pen has the same chance of being selected. In this case, S = {R, G, B, P},

where R means ‘red pen chosen’ and so on. In this case, if A is the event ‘red or green pen chosen’, then

P(A) =

|A|

|S|

More generally, if I have a set of n objects and choose one, with each one equally likely to be chosen, then each of the n outcomes has probability 1/n, and an event consisting of m of the outcomes has probability m/n. What if we choose more than one pen? We have to be more careful to specify the sample space. First, we have to say whether we are

  • sampling with replacement, or
  • sampling without replacement.

Sampling with replacement means that we choose a pen, note its colour, put it back and shake the drawer, then choose a pen again (which may be the same pen as before or a different one), and so on until the required number of pens have been chosen. If we choose two pens with replacement, the sample space is

{RR, RG, RB, RP, GR, GG, GB, GP, BR, BG, BB, BP, PR, PG, PB, PP}

The event ‘at least one red pen’ is {RR, RG, RB, RP, GR, BR, PR}, and has proba- bility 7/16. Sampling without replacement means that we choose a pen but do not put it back, so that our final selection cannot include two pens of the same colour. In this case, the sample space for choosing two pens is

{ RG, RB, RP, GR, GB, GP, BR, BG, BP, PR, PG, PB }

and the event ‘at least one red pen’ is {RG, RB, RP, GR, BR, PR}, with probability 6 / 12 = 1 /2.

1.7. SAMPLING 9

Now there is another issue, depending on whether we care about the order in which the pens are chosen. We will only consider this in the case of sampling without replacement. It doesn’t really matter in this case whether we choose the pens one at a time or simply take two pens out of the drawer; and we are not interested in which pen was chosen first. So in this case the sample space is

{{R, G}, {R, B}, {R, P}, {G, B}, {G, P}, {B, P}},

containing six elements. (Each element is written as a set since, in a set, we don’t care which element is first, only which elements are actually present. So the sam- ple space is a set of sets!) The event ‘at least one red pen’ is {{R, G}, {R, B}, {R, P}}, with probability 3/ 6 = 1 /2. We should not be surprised that this is the same as in the previous case.

There are formulae for the sample space size in these three cases. These in- volve the following functions:

n! = n(n − 1 )(n − 2 ) · · · 1 nP k =^ n(n^ −^1 )(n^ −^2 )^ · · ·^ (n^ −^ k^ +^1 ) nCk = nPk/k!

Note that n! is the product of all the whole numbers from 1 to n; and

nP k =^

n! (n − k)!

so that nC k =^

n! k!(n − k)!

Theorem 1.10 The number of selections of k objects from a set of n objects is given in the following table. with replacement without replacement ordered sample nk^ nPk unordered sample nCk

In fact the number that goes in the empty box is n+k−^1 Ck, but this is much harder to prove than the others, and you are very unlikely to need it. Here are the proofs of the other three cases. First, for sampling with replace- ment and ordered sample, there are n choices for the first object, and n choices for the second, and so on; we multiply the choices for different objects. (Think of the choices as being described by a branching tree.) The product of k factors each equal to n is nk.

1.7. SAMPLING 11

Consider sampling with replacement. Then |S| = 2010. What is |A|? The

number of ways in which we can choose first five red balls and then five blue ones (that is, RRRRRBBBBB), is 10^5 · 105 = 1010. But there are many other ways to get five red and five blue balls. In fact, the five red balls could appear in any five of the ten draws. This means that there are 10 C 5 = 252 different patterns of five Rs and five Bs. So we have |A| = 252 · 1010 ,

and so

P(A) =

Now consider sampling without replacement. If we regard the sample as being

ordered, then |S| = 20 P 10. There are 10 P 5 ways of choosing five of the ten red

balls, and the same for the ten blue balls, and as in the previous case there are (^10) C 5 patterns of red and blue balls. So

|A| = (^10 P 5 )^2 · 10 C 5 ,

and

P(A) =

(^10 P 5 )^2 · 10 C 5
20 P 10 =^0.^343...

If we regard the sample as being unordered, then |S| = 20 C 10. There are 10 C 5

choices of the five red balls and the same for the blue balls. We no longer have to count patterns since we don’t care about the order of the selection. So

|A| = (^10 C 5 )^2 ,

and

P(A) =

(^10 C 5 )^2
20 C 10 =^0.^343...

This is the same answer as in the case before, as it should be; the question doesn’t care about order of choices! So the event is more likely if we sample with replacement.

Example I have 6 gold coins, 4 silver coins and 3 bronze coins in my pocket. I take out three coins at random. What is the probability that they are all of different material? What is the probability that they are all of the same material?

In this case the sampling is without replacement and the sample is unordered.

So |S| = 13 C 3 = 286. The event that the three coins are all of different material

can occur in 6 · 4 · 3 = 72 ways, since we must have one of the six gold coins, and so on. So the probability is 72/ 286 = 0. 252...

12 CHAPTER 1. BASIC IDEAS

The event that the three coins are of the same material can occur in (^6) C 3 +^

4 C
3 +^
3 C
3 =^20 +^4 +^1 =^25

ways, and the probability is 25/ 286 = 0. 087...

In a sampling problem, you should first read the question carefully and decide whether the sampling is with or without replacement. If it is without replacement, decide whether the sample is ordered (e.g. does the question say anything about the first object drawn?). If so, then use the formula for ordered samples. If not, then you can use either ordered or unordered samples, whichever is convenient; they should give the same answer. If the sample is with replacement, or if it involves throwing a die or coin several times, then use the formula for sampling with replacement.

1.8 Stopping rules

Suppose that you take a typing proficiency test. You are allowed to take the test up to three times. Of course, if you pass the test, you don’t need to take it again. So the sample space is

S = {p, f p, f f p, f f f },

where for example f f p denotes the outcome that you fail twice and pass on your third attempt. If all outcomes were equally likely, then your chance of eventually passing the test and getting the certificate would be 3/4. But it is unreasonable here to assume that all the outcomes are equally likely. For example, you may be very likely to pass on the first attempt. Let us assume that the probability that you pass the test is 0.8. (By Proposition 3, your chance of failing is 0.2.) Let us further assume that, no matter how many times you have failed, your chance of passing at the next attempt is still 0.8. Then we have

P(p) = 0. 8 , P( f p) = 0. 2 · 0. 8 = 0. 16 , P( f f p) = 0. 22 · 0. 8 = 0. 032 , P( f f f ) = 0. 23 = 0. 008.

Thus the probability that you eventually get the certificate is P({p, f p, f f p}) =

  1. 8 + 0. 16 + 0. 032 = 0 .992. Alternatively, you eventually get the certificate unless you fail three times, so the probability is 1 − 0. 008 = 0 .992. A stopping rule is a rule of the type described here, namely, continue the exper- iment until some specified occurrence happens. The experiment may potentially be infinite.