Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Probability Distributions: Discrete and Continuous - Prof. J. Tebbs, Study notes of Mathematics

The concepts of discrete and continuous probability distributions, including random variables, probability mass functions (pmf), cumulative distribution functions (cdf), expected values, moment generating functions, and joint distributions. Topics include discrete distributions such as bernoulli, geometric, negative binomial, and hypergeometric distributions, as well as continuous distributions like the normal distribution. The document also includes examples and formulas for calculating probabilities and expectations.

Typology: Study notes

Pre 2010

Uploaded on 10/01/2009

koofers-user-mrt-1
koofers-user-mrt-1 🇺🇸

10 documents

1 / 138

Toggle sidebar

Related documents


Partial preview of the text

Download Probability Distributions: Discrete and Continuous - Prof. J. Tebbs and more Study notes Mathematics in PDF only on Docsity! STAT/MATH 511 PROBABILITY Fall, 2009 Lecture Notes Joshua M. Tebbs Department of Statistics University of South Carolina TABLE OF CONTENTS STAT/MATH 511, J. TEBBS Contents 2 Probability 1 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2.2 Sample spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.3 Basic set theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.4 Properties of probability . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.5 Discrete probability models and events . . . . . . . . . . . . . . . . . . . 8 2.6 Tools for counting sample points . . . . . . . . . . . . . . . . . . . . . . . 10 2.6.1 The multiplication rule . . . . . . . . . . . . . . . . . . . . . . . . 10 2.6.2 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.6.3 Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.7 Conditional probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.8 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.9 Law of Total Probability and Bayes Rule . . . . . . . . . . . . . . . . . . 22 3 Discrete Distributions 27 3.1 Random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2 Probability distributions for discrete random variables . . . . . . . . . . . 28 3.3 Mathematical expectation . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.4 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.5 Moment generating functions . . . . . . . . . . . . . . . . . . . . . . . . 37 3.6 Binomial distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.7 Geometric distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.8 Negative binomial distribution . . . . . . . . . . . . . . . . . . . . . . . . 48 3.9 Hypergeometric distribution . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.10 Poisson distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 i CHAPTER 2 STAT/MATH 511, J. TEBBS 2 Probability Complementary reading: Chapter 2 (WMS). 2.1 Introduction TERMINOLOGY : The text defines probability as a measure of one’s belief in the occurrence of a future (random) event. Probability is also known as “the mathematics of uncertainty.” REAL LIFE EVENTS : Here are some events we may wish to assign probabilities to: • tomorrow’s temperature exceeding 80 degrees • getting a flat tire on my way home today • a new policy holder making a claim in the next year • the NASDAQ losing 5 percent of its value this week • you being diagnosed with prostate/cervical cancer in the next 20 years. ASSIGNING PROBABILITIES : How do we assign probabilities to events? There are three general approaches. 1. Subjective approach. • This approach is based on feeling and may not even be scientific. 2. Relative frequency approach. • This approach can be used when some random phenomenon is observed re- peatedly under identical conditions. 3. Axiomatic/Model-based approach. This is the approach we will take in this course. PAGE 1 CHAPTER 2 STAT/MATH 511, J. TEBBS Trial Pr op or tio n of 2 s 0 200 400 600 800 1000 0. 0 0. 05 0. 10 0. 15 Trial Pr op or tio n of 2 s 0 200 400 600 800 1000 0. 0 0. 05 0. 10 0. 15 0. 20 Trial Pr op or tio n of 2 s 0 200 400 600 800 1000 0. 0 0. 05 0. 10 0. 15 Trial Pr op or tio n of 2 s 0 200 400 600 800 1000 0. 0 0. 05 0. 10 0. 15 0. 20 Figure 2.1: The relative frequency of die rolls which result in a “2”; each plot represents 1000 simulated rolls of a fair die. Example 2.1. Relative frequency approach. Suppose that we roll a die 1000 times and record the number of times we observe a “2.” Let A denote this event. The relative frequency approach says that P (A) ≈ number of times A occurs number of trials performed = n(A) n , where n(A) denotes the frequency of the event, and n denotes the number of trials performed. The proportion n(A)/n is called the relative frequency. The symbol P (A) is shorthand for “the probability that A occurs.” RELATIVE FREQUENCY APPROACH : Continuing with our example, suppose that n(A) = 158. We would then estimate P (A) by 158/1000 = 0.158. If we performed the experiment of rolling a die repeatedly, the relative frequency approach says that n(A) n → P (A), as n →∞. Of course, if the die is fair, then n(A)/n → P (A) = 1/6. ¤ PAGE 2 CHAPTER 2 STAT/MATH 511, J. TEBBS 2.2 Sample spaces TERMINOLOGY : Suppose that a random experiment is performed and that we observe an outcome from the experiment (e.g., rolling a die). The set of all possible outcomes for an experiment is called the sample space and is denoted by S. Example 2.2. In each of the following random experiments, we write out a correspond- ing sample space. (a) The Michigan state lottery calls for a three-digit integer to be selected: S = {000, 001, 002, ..., 998, 999}. (b) A USC student is tested for chlamydia (0 = negative, 1 = positive): S = {0, 1}. (c) An industrial experiment consists of observing the lifetime of a battery, measured in hours. Different sample spaces are: S1 = {w : w ≥ 0} S2 = {0, 1, 2, 3, ..., } S3 = {defective, not defective}. Sample spaces are not unique; in fact, how we describe the sample space has a direct influence on how we assign probabilities to outcomes in this space. ¤ 2.3 Basic set theory TERMINOLOGY : A countable set A is a set whose elements can be put into a one- to-one correspondence with N = {1, 2, ..., }, the set of natural numbers. A set that is not countable is said to be uncountable. TERMINOLOGY : Countable sets can be further divided up into two types. • A countably infinite set has an infinite number of elements. • A countably finite set has a finite number of elements. PAGE 3 CHAPTER 2 STAT/MATH 511, J. TEBBS 2.4 Properties of probability KOLMOLGOROV AXIOMS OF PROBABILITY : Given a nonempty sample space S, the measure P (A) is a set function satisfying three axioms: (1) P (A) ≥ 0, for every A ⊆ S (2) P (S) = 1 (3) If A1, A2, ..., is a countable sequence of pairwise disjoint events (i.e., Ai∩Aj = ∅, for i 6= j) in S, then P ( ∞⋃ i=1 Ai ) = ∞∑ i=1 P (Ai). RESULTS : The following results are important properties of the probability set function P (·), and each one follows from the Kolmolgorov axioms just stated. All events below are assumed to be subsets of a nonempty sample space S. 1. Complement rule: For any event A, P (A) = 1− P (A). Proof. Note that S = A ∪ A. Thus, since A and A are disjoint, P (A ∪ A) = P (A) + P (A) by Axiom 3. By Axiom 2, P (S) = 1. Thus, 1 = P (S) = P (A ∪ A) = P (A) + P (A). ¤ 2. P (∅) = 0. Proof. Take A = ∅ and A = S. Use the last result and Axiom 2. ¤ 3. Monotonicity property: Suppose that A and B are two events such that A ⊂ B. Then, P (A) ≤ P (B). Proof. Write B = A ∪ (B ∩ A). Clearly, A and (B ∩ A) are disjoint. Thus, by Axiom 3, P (B) = P (A) + P (B ∩ A). Because P (B ∩ A) ≥ 0, we are done. ¤ PAGE 6 CHAPTER 2 STAT/MATH 511, J. TEBBS 4. For any event A, P (A) ≤ 1. Proof. Since A ⊂ S, this follows from the monotonicity property and Axiom 2. ¤ 5. Inclusion-exclusion: Suppose that A and B are two events. Then, P (A ∪B) = P (A) + P (B)− P (A ∩B). Proof. Write A ∪ B = A ∪ (A ∩ B). Then, since A and (A ∩ B) are disjoint, by Axiom 3, P (A ∪B) = P (A) + P (A ∩B). Now, write B = (A ∩ B) ∪ (A ∩ B). Clearly, (A ∩ B) and (A ∩ B) are disjoint. Thus, again, by Axiom 3, P (B) = P (A ∩B) + P (A ∩B). Combining the last expressions for P (A ∪B) and P (B) gives the result. ¤ Example 2.6. The probability that train 1 is on time is 0.95, and the probability that train 2 is on time is 0.93. The probability that both are on time is 0.90. (a) What is the probability that at least one train is on time? Solution: Denote by Ai the event that train i is on time, for i = 1, 2. Then, P (A1 ∪ A2) = P (A1) + P (A2)− P (A1 ∩ A2) = 0.95 + 0.93− 0.90 = 0.98. (b) What is the probability that neither train is on time? Solution: By DeMorgan’s Law, P (A1 ∩ A2) = P (A1 ∪ A2) = 1− P (A1 ∪ A2) = 1− 0.98 = 0.02. ¤ EXTENSION : The inclusion-exclusion formula can be extended to any finite sequence of sets A1, A2, ..., An. For example, if n = 3, P (A1 ∪ A2 ∪ A3) = P (A1) + P (A2) + P (A3)− P (A1 ∩ A2)− P (A1 ∩ A3) − P (A2 ∩ A3) + P (A1 ∩ A2 ∩ A3). PAGE 7 CHAPTER 2 STAT/MATH 511, J. TEBBS In general, the inclusion-exclusion formula can be written for any finite sequence: P ( n⋃ i=1 Aj ) = n∑ i=1 P (Ai)− ∑ i1<i2 P (Ai1 ∩ Ai2) + ∑ i1<i2<i3 P (Ai1 ∩ Ai2 ∩ Ai3)− · · ·+ (−1)n+1P (A1 ∩ A2 ∩ · · · ∩ An). Of course, if the sets A1, A2, ..., An are pairwise disjoint, then we arrive back at P ( n⋃ i=1 Ai ) = n∑ i=1 P (Ai), a result implied by Axiom 3 by taking An+1 = An+2 = · · · = ∅. 2.5 Discrete probability models and events TERMINOLOGY : If a sample space for an experiment contains a finite or countable number of sample points, we call it a discrete sample space. • Finite: “number of sample points < ∞.” • Countable: “number of sample points may equal ∞, but can be counted; i.e., sample points may be put into a 1:1 correspondence with N = {1, 2, ..., }.” Example 2.7. A standard roulette wheel contains an array of numbered compartments referred to as “pockets.” The pockets are either red, black, or green. The numbers 1 through 36 are evenly split between red and black, while 0 and 00 are green pockets. On the next play, we are interested in the following events: A1 = {13} A2 = {red} A3 = {0, 00}. TERMINOLOGY : A simple event is an event that can not be decomposed. That is, a simple event corresponds to exactly one sample point. Compound events are those events that contain more than one sample point. In Example 2.7, because A1 contains PAGE 8 CHAPTER 2 STAT/MATH 511, J. TEBBS Example 2.11. In a controlled field experiment, I want to form all possible treatment combinations among the three factors: Factor 1: Fertilizer (60 kg, 80 kg, 100kg: 3 levels) Factor 2: Insects (infected/not infected: 2 levels) Factor 3: Precipitation level (low, high: 2 levels). Here, n1 = 3, n2 = 2, and n3 = 2. Thus, by the multiplication rule, there are n1×n2×n3 = 12 different treatment combinations. ¤ Example 2.12. Suppose that an Iowa license plate consists of seven places; the first three are occupied by letters; the remaining four with numbers. Compute the total number of possible orderings if (a) there are no letter/number restrictions. (b) repetition of letters is prohibited. (c) repetition of numbers is prohibited. (d) repetitions of numbers and letters are prohibited. Answers: (a) 26× 26× 26× 10× 10× 10× 10 = 175, 760, 000 (b) 26× 25× 24× 10× 10× 10× 10 = 156, 000, 000 (c) 26× 26× 26× 10× 9× 8× 7 = 88, 583, 040 (d) 26× 25× 24× 10× 9× 8× 7 = 78, 624, 000 2.6.2 Permutations TERMINOLOGY : A permutation is an arrangement of distinct objects in a particular order. Order is important. PAGE 11 CHAPTER 2 STAT/MATH 511, J. TEBBS PROBLEM : Suppose that we have n distinct objects and we want to order (or permute) these objects. Thinking of n slots, we will put one object in each slot. There are • n different ways to choose the object for slot 1, • n− 1 different ways to choose the object for slot 2, • n− 2 different ways to choose the object for slot 3, and so on, down to • 2 different ways to choose the object for slot (n− 1), and • 1 way to choose for the last slot. IMPLICATION : By the multiplication rule, there are n(n − 1)(n − 2) · · · (2)(1) = n! different ways to order (permute) the n distinct objects. Example 2.13. My bookshelf has 10 books on it. How many ways can I permute the 10 books on the shelf? Answer: 10! = 3, 628, 800. ¤ Example 2.14. Now, suppose that in Example 2.13 there are 4 math books, 2 chemistry books, 3 physics books, and 1 statistics book. I want to order the 10 books so that all books of the same subject are together. How many ways can I do this? Solution: Use the multiplication rule. Stage 1 Permute the 4 math books 4! Stage 2 Permute the 2 chemistry books 2! Stage 3 Permute the 3 physics books 3! Stage 4 Permute the 1 statistics book 1! Stage 5 Permute the 4 subjects {m, c, p, s} 4! Thus, there are 4!× 2!× 3!× 1!× 4! = 6912 different orderings. ¤ PAGE 12 CHAPTER 2 STAT/MATH 511, J. TEBBS PERMUTATIONS : With a collection of n distinct objects, we now want to choose and permute r of them (r ≤ n). The number of ways to do this is Pn,r ≡ n! (n− r)! . The symbol Pn,r is read “the permutation of n things taken r at a time.” Proof. Envision r slots. There are n ways to fill the first slot, n−1 ways to fill the second slot, and so on, until we get to the rth slot, in which case there are n− r + 1 ways to fill it. Thus, by the multiplication rule, there are n(n− 1) · · · (n− r + 1) = n! (n− r)! different permutations. ¤ Example 2.15. With a group of 5 people, I want to choose a committee with three members: a president, a vice-president, and a secretary. There are P5,3 = 5! (5− 3)! = 120 2 = 60 different committees possible. Here, note that order is important. ¤ Example 2.16. What happens if the objects to permute are not distinct? Consider the word PEPPER. How many permutations of the letters are possible? Trick: Initially, treat all letters as distinct objects by writing, say, P1E1P2P3E2R. There are 6! = 720 different orderings of these distinct objects. Now, there are 3! ways to permute the P s 2! ways to permute the Es 1! ways to permute the Rs. So, 6! is actually 3!× 2!× 1! times too large. That is, there are 6! 3! 2! 1! = 60 possible permutations. ¤ PAGE 13 CHAPTER 2 STAT/MATH 511, J. TEBBS Proof : Choosing r objects is equivalent to breaking the n objects into two distinguishable groups: Group 1 r chosen Group 2 (n− r) not chosen. There are Cn,r = n! r!(n−r)! ways to do this. ¤ REMARK : We will adopt the notation ( n r ) , read “n choose r,” as the symbol for Cn,r. The terms ( n r ) are called binomial coefficients since they arise in the algebraic expansion of a binomial; viz., (x + y)n = n∑ r=0 ( n r ) xn−ryr. Example 2.19. Return to Example 2.15. Now, suppose that we only want to choose 3 committee members from 5 (without designations for president, vice-president, and secretary). Then, there are ( 5 3 ) = 5! 3! (5− 3)! = 5× 4× 3! 3!× 2! = 10 different committees. ¤ NOTE : From Examples 2.15 and 2.19, one should note that Pn,r = r!× Cn,r. Recall that combinations do not regard order as important. Thus, once we have chosen our r objects (there are Cn,r ways to do this), there are then r! ways to permute those r chosen objects. Thus, we can think of a permutation as simply a combination times the number of ways to permute the r chosen objects. Example 2.20. A company receives 20 hard drives. Five of the drives will be randomly selected and tested. If all five are satisfactory, the entire lot will be accepted. Otherwise, the entire lot is rejected. If there are really 3 defectives in the lot, what is the probability of accepting the lot? PAGE 16 CHAPTER 2 STAT/MATH 511, J. TEBBS Solution: First, the number of sample points in S is given by N = ( 20 5 ) = 20! 5! (20− 5)! = 15504. Let A denote the event that the lot is accepted. How many ways can A occur? Use the multiplication rule. Stage 1 Choose 5 good drives from 17 ( 17 5 ) Stage 2 Choose 0 bad drives from 3 ( 3 0 ) By the multiplication rule, there are na = ( 17 5 )× (3 0 ) = 6188 different ways A can occur. Assuming an equiprobability model (i.e., each outcome is equally likely), P (A) = na/N = 6188/15504 ≈ 0.399. ¤ 2.7 Conditional probability MOTIVATION : In some problems, we may be fortunate enough to have prior knowledge about the likelihood of events related to the event of interest. We may want to incorporate this information into a probability calculation. TERMINOLOGY : Let A and B be events in a nonempty sample space S. The condi- tional probability of A, given that B has occurred, is given by P (A|B) = P (A ∩B) P (B) , provided that P (B) > 0. Example 2.21. A couple has two children. (a) What is the probability that both are girls? (b) What is the probability that both are girls, if the eldest is a girl? PAGE 17 CHAPTER 2 STAT/MATH 511, J. TEBBS Solution: (a) The sample space is given by S = {(M,M), (M, F ), (F, M), (F, F )} and N = 4, the number of sample points in S. Define A1 = {1st born child is a girl}, A2 = {2nd born child is a girl}. Clearly, A1 ∩ A2 = {(F, F )} and P (A1 ∩ A2) = 1/4, assuming that the four outcomes in S are equally likely. Solution: (b) Now, we want P (A2|A1). Applying the definition of conditional proba- bility, we get P (A2|A1) = P (A1 ∩ A2) P (A1) = 1/4 2/4 = 1/2. ¤ Example 2.22. In a certain community, 36 percent of the families own a dog, 22 percent of the families that own a dog also own a cat, and 30 percent of the families own a cat. A family is selected at random. (a) Compute the probability that the family owns both a cat and dog. (b) Compute the probability that the family owns a dog, given that it owns a cat. Solution: Let C = {family owns a cat} and D = {family owns a dog}. From the problem, we are given that P (D) = 0.36, P (C|D) = 0.22 and P (C) = 0.30. In (a), we want P (C ∩D). We have 0.22 = P (C|D) = P (C ∩D) P (D) = P (C ∩D) 0.36 . Thus, P (C ∩D) = 0.36× 0.22 = 0.0792. For (b), we want P (D|C). Simply use the definition of conditional probability: P (D|C) = P (C ∩D) P (C) = 0.0792 0.30 = 0.264. ¤ PAGE 18 CHAPTER 2 STAT/MATH 511, J. TEBBS Example 2.24. A red die and a white die are rolled. Let A = {4 on red die} and B = {sum is odd}. Of the 36 outcomes in S, 6 are favorable to A, 18 are favorable to B, and 3 are favorable to A ∩B. Assuming the outcomes are equally likely, 3 36 = P (A ∩B) = P (A)P (B) = 6 36 × 18 36 , and the events A and B are independent. ¤ Example 2.25. In an engineering system, two components are placed in a series; that is, the system is functional as long as both components are. Let Ai; i = 1, 2, denote the event that component i is functional. Assuming independence, the probability the system is functional is then P (A1 ∩ A2) = P (A1)P (A2). If P (Ai) = 0.95, for example, then P (A1 ∩ A2) = 0.95 × 0.95 = 0.9025. If the events A1 and A2 are not independent, we do not have enough information to compute P (A1 ∩ A2). ¤ INDEPENDENCE OF COMPLEMENTS : If A and B are independent events, so are (a) A and B (b) A and B (c) A and B. Proof. We will only prove (a). The other parts follow similarly. P (A ∩B) = P (A|B)P (B) = [1− P (A|B)]P (B) = [1− P (A)]P (B) = P (A)P (B). ¤ EXTENSION : The concept of independence (and independence of complements) can be extended to any finite number of events in S. TERMINOLOGY : Let A1, A2, ..., An denote a collection of n ≥ 2 events in a nonempty sample space S. The events A1, A2, ..., An are said to be mutually independent if for any subcollection of events, say, Ai1 , Ai2 , ..., Aik , 2 ≤ k ≤ n, we have P ( k⋂ j=1 Aij ) = k∏ j=1 P (Aij). PAGE 21 CHAPTER 2 STAT/MATH 511, J. TEBBS Challenge: Come up with a random experiment and three events which are pairwise independent, but not mutually independent. COMMON SETTING : Many experiments consist of a sequence of n trials that are viewed as independent (e.g., flipping a coin 10 times). If Ai denotes the event associated with the ith trial, and the trials are independent, then P ( n⋂ i=1 Ai ) = n∏ i=1 P (Ai). Example 2.26. An unbiased die is rolled six times. Let Ai = {i appears on roll i}, for i = 1, 2, ..., 6. Then, P (Ai) = 1/6, and assuming independence, P (A1 ∩ A2 ∩ A3 ∩ A4 ∩ A5 ∩ A6) = 6∏ i=1 P (Ai) = (1 6 )6 . Suppose that if Ai occurs, we will call it “a match.” What is the probability of at least one match in the six rolls? Solution: Let B denote the event that there is at least one match. Then, B denotes the event that there are no matches. Now, P (B) = P (A1 ∩ A2 ∩ A3 ∩ A4 ∩ A5 ∩ A6) = 6∏ i=1 P (Ai) = (5 6 )6 = 0.335. Thus, P (B) = 1− P (B) = 1− 0.335 = 0.665, by the complement rule. Exercise: Generalize this result to an n sided die. What does this probability converge to as n →∞? ¤ 2.9 Law of Total Probability and Bayes Rule SETTING : Suppose A and B are events in a nonempty sample space S. We can express the event A as follows A = (A ∩B) ∪ (A ∩B)︸ ︷︷ ︸ union of disjoint events . PAGE 22 CHAPTER 2 STAT/MATH 511, J. TEBBS By the third Kolmolgorov axiom, P (A) = P (A ∩B) + P (A ∩B) = P (A|B)P (B) + P (A|B)P (B), where the last step follows from the multiplication law of probability. This is called the Law of Total Probability (LOTP). The LOTP is helpful. Sometimes P (A|B), P (A|B), and P (B) may be easily computed with available information whereas computing P (A) directly may be difficult. NOTE : The LOTP follows from the fact that B and B partition S; that is, (a) B and B are disjoint, and (b) B ∪B = S. Example 2.27. An insurance company classifies people as “accident-prone” and “non- accident-prone.” For a fixed year, the probability that an accident-prone person has an accident is 0.4, and the probability that a non-accident-prone person has an accident is 0.2. The population is estimated to be 30 percent accident-prone. (a) What is the probability that a new policy-holder will have an accident? Solution: Define A = {policy holder has an accident} and B = {policy holder is accident-prone}. Then, P (B) = 0.3, P (A|B) = 0.4, P (B) = 0.7, and P (A|B) = 0.2. By the LOTP, P (A) = P (A|B)P (B) + P (A|B)P (B) = (0.4)(0.3) + (0.2)(0.7) = 0.26. ¤ (b) Now suppose that the policy-holder does have an accident. What is the probability that he was “accident-prone?” Solution: We want P (B|A). Note that P (B|A) = P (A ∩B) P (A) = P (A|B)P (B) P (A) = (0.4)(0.3) 0.26 = 0.46. ¤ PAGE 23 CHAPTER 2 STAT/MATH 511, J. TEBBS Supplier 3. For each supplier, defective rates are as follows: Supplier 1: 0.01, Supplier 2: 0.02, and Supplier 3: 0.03. The manufacturer observes a defective box of raw material. (a) What is the probability that it came from Supplier 2? (b) What is the probability that the defective did not come from Supplier 3? Solution: (a) Let A = {observe defective box}. Let B1, B2, and B3, respectively, denote the events that the box comes from Supplier 1, 2, and 3. The prior probabilities (ignoring the status of the box) are P (B1) = 0.6 P (B2) = 0.3 P (B3) = 0.1. Note that {B1, B2, B3} partitions the space of possible suppliers. Thus, by Bayes Rule, P (B2|A) = P (A|B2)P (B2) P (A|B1)P (B1) + P (A|B2)P (B2) + P (A|B3)P (B3) = (0.02)(0.3) (0.01)(0.6) + (0.02)(0.3) + (0.03)(0.1) = 0.40. This is the updated (posterior) probability that the box came from Supplier 2 (updated to include the information that the box was defective). Solution: (b) First, compute the posterior probability P (B3|A). By Bayes Rule, P (B3|A) = P (A|B3)P (B3) P (A|B1)P (B1) + P (A|B2)P (B2) + P (A|B3)P (B3) = (0.03)(0.1) (0.01)(0.6) + (0.02)(0.3) + (0.03)(0.1) = 0.20. Thus, P (B3|A) = 1− P (B3|A) = 1− 0.20 = 0.80, by the complement rule. ¤ NOTE : Read Sections 2.11 (Numerical Events and Random Variables) and 2.12 (Random Sampling) in WMS. PAGE 26 CHAPTER 3 STAT/MATH 511, J. TEBBS 3 Discrete Distributions Complementary reading: Chapter 3 (WMS), except § 3.10 and § 3.11. 3.1 Random variables PROBABILISTIC DEFINITION : A random variable Y is a function whose domain is the sample space S and whose range is the set of real numbers R = {y : −∞ < y < ∞}. That is, Y : S →R takes sample points in S and assigns them a real number. WORKING DEFINITION : In simpler terms, a random variable is a variable whose observed value is determined by chance. Example 3.1. Suppose that an experiment consists of flipping two fair coins. The sample space is S = {(H, H), (H, T ), (T, H), (T, T )}. Let Y denote the number of heads observed. Before we perform the experiment, we do not know, with certainty, the value of Y . We can, however, list out the possible values of Y corresponding to each sample point: Ei Y (Ei) = y Ei Y (Ei) = y (H, H) 2 (T, H) 1 (H, T ) 1 (T, T ) 0 For each sample point Ei, Y takes on a numerical value specific to Ei. This is precisely why we can think of Y as a function; i.e., Y [(H, H)] = 2 Y [(H, T )] = 1 Y [(T, H)] = 1 Y [(T, T )] = 0, so that P (Y = 2) = P [(H, H)] = 1/4 P (Y = 1) = P [(H, T )] + P [(T, H)] = 1/4 + 1/4 = 1/2 P (Y = 0) = P [(T, T )] = 1/4. PAGE 27 CHAPTER 3 STAT/MATH 511, J. TEBBS NOTE : From these probability calculations; note that we can • work on the sample space S and compute probabilities from S, or • work on R and compute probabilities for events {Y ∈ B}, where B ⊂ R. NOTATION : We denote a random variable Y using a capital letter. We denote an observed value of Y by y, a lowercase letter. This is standard notation. For example, if Y denotes the weight (in ounces) of the next newborn boy in Columbia, SC, then Y is random variable. After the baby is born, we observe that the baby weighs y = 128 oz. 3.2 Probability distributions for discrete random variables TERMINOLOGY : The support of a random variable Y is set of all possible values that Y can assume. We will denote the support set by R. TERMINOLOGY : If the random variable Y has a support set R that is countable (finitely or infinitely), we call Y a discrete random variable. Example 3.2. An experiment consists of rolling an unbiased die. Consider the two random variables: X = face value on the first roll Y = number of rolls needed to observe a six. The support of X is RX = {1, 2, 3, 4, 5, 6}. The support of Y is RY = {1, 2, 3, ...}. RX is finitely countable and RY is infinitely countable; thus, both X and Y are discrete. ¤ GOAL: For a discrete random variable Y , we would like to find P (Y = y) for any y ∈ R. Mathematically, pY (y) ≡ P (Y = y) = ∑ P [Ei ∈ S : Y (Ei) = y], for all y ∈ R. PAGE 28 CHAPTER 3 STAT/MATH 511, J. TEBBS Example 3.4. An experiment consists of rolling an unbiased die until the first “6” is observed. Let Y denote the number of rolls needed. The support is R = {1, 2, ...}. Assuming independent trials, we have P (Y = 1) = 1 6 P (Y = 2) = 5 6 × 1 6 P (Y = 3) = 5 6 × 5 6 × 1 6 ; Recognizing the pattern, we see that the pmf for Y is given by pY (y) =    1 6 ( 5 6 )y−1 , y = 1, 2, ... 0, otherwise. This pmf is depicted in a probability histogram below: 0 5 10 15 20 25 30 y 0.00 0.05 0.10 0.15 p( y) =P (Y =y ) Question: Is this a valid pmf; i.e., do the probabilities pY (y) sum to one? Note that ∑ y∈R pY (y) = ∞∑ y=1 1 6 ( 5 6 )y−1 = ∞∑ x=0 1 6 ( 5 6 )x = ( 1 6 1− 5 6 ) = 1. ¤ PAGE 31 CHAPTER 3 STAT/MATH 511, J. TEBBS IMPORTANT : In the last calculation, we have used an important fact concerning infi- nite geometric series; namely, if a is any real number and |r| < 1. Then, ∞∑ x=0 arx = a 1− r . We will use this fact many times in this course! Exercise: Find the probability that the first “6” is observed on (a) an odd-numbered roll (b) an even-numbered roll. Which event is more likely? ¤ 3.3 Mathematical expectation TERMINOLOGY : Let Y be a discrete random variable with pmf pY (y) and support R. The expected value of Y is given by E(Y ) = ∑ y∈R ypY (y). The expected value for discrete random variable Y is simply a weighted average of the possible values of Y . Each support point y is weighted by the probability pY (y). ASIDE : When R is a countably infinite set, then the sum ∑ y∈R ypY (y) may not exist (not surprising since sometimes infinite series do diverge). Mathematically, we require the sum above to be absolutely convergent; i.e., ∑ y∈R |y|pY (y) < ∞. If this is true, we say that E(Y ) exists. If this is not true, then we say that E(Y ) does not exist. Note: If R is a finite set, then E(Y ) always exists, because a finite sum of finite quantities is always finite. Example 3.5. Let the random variable Y have pmf pY (y) =    1 10 (5− y), y = 1, 2, 3, 4 0, otherwise. PAGE 32 CHAPTER 3 STAT/MATH 511, J. TEBBS The expected value of Y is given by E(Y ) = ∑ y∈R ypY (y) = 4∑ y=1 y [ 1 10 (5− y) ] = 1(4/10) + 2(3/10) + 3(2/10) + 4(1/10) = 2. ¤ INTERPRETATION : The quantity E(Y ) has many interpretations: (a) the “center of gravity” of a probability distribution (b) a long-run average (c) the first moment of the random variable (d) the mean of a population. FUNCTIONS OF Y : Let Y be a discrete random variable with pmf pY (y) and support R. Suppose that g is a real-valued function. Then, g(Y ) is a random variable and E[g(Y )] = ∑ y∈R g(y)pY (y). The proof of this result is given on pp 93 (WMS). Again, we require that ∑ y∈R |g(y)|pY (y) < ∞. If this is not true, then E[g(Y )] does not exist. Example 3.6. In Example 3.5, find E(Y 2) and E(eY ). Solution: The functions g1(Y ) = Y 2 and g2(Y ) = e Y are real functions of Y . From the definition, we have E(Y 2) = ∑ y∈R y2pY (y) = 4∑ y=1 y2 [ 1 10 (5− y) ] = 12(4/10) + 22(3/10) + 32(2/10) + 42(1/10) = 5. Also, E(eY ) = ∑ y∈R eypY (y) = 4∑ y=1 ey [ 1 10 (5− y) ] = e1(4/10) + e2(3/10) + e3(2/10) + e4(1/10) ≈ 12.78. ¤ PAGE 33 CHAPTER 3 STAT/MATH 511, J. TEBBS (b) σ2 = 0 if and only if the random variable Y has a degenerate distribution; i.e., all the probability mass is located at one support point. (c) The larger (smaller) σ2 is, the more (less) spread in the possible values of Y about the mean µ = E(Y ). (d) σ2 is measured in (units)2 and σ is measured in the original units. VARIANCE COMPUTING FORMULA: Let Y be a random variable with (finite) mean E(Y ) = µ. Then V (Y ) = E[(Y − µ)2] = E(Y 2)− [E(Y )]2. Proof. Expand the (Y − µ)2 term and distribute the expectation operator as follows: E[(Y − µ)2] = E(Y 2 − 2µY + µ2) = E(Y 2)− 2µE(Y ) + µ2 = E(Y 2)− 2µ2 + µ2 = E(Y 2)− µ2. ¤ Example 3.9. The discrete uniform distribution. Suppose that the random variable X has pmf pX(x) =    1/m, x = 1, 2, ..., m 0, otherwise, where m is a positive integer larger than 1. Find the variance of X. Solution. We find σ2 = V (X) using the variance computing formula. In Example 3.7, we computed µ = E(X) = m + 1 2 . We first find E(X2): E(X2) = ∑ x∈R x2pX(x) = m∑ x=1 x2 ( 1 m ) = 1 m m∑ x=1 x2 = 1 m [ m(m + 1)(2m + 1) 6 ] = (m + 1)(2m + 1) 6 . PAGE 36 CHAPTER 3 STAT/MATH 511, J. TEBBS We have used the well-known fact that ∑m x=1 x 2 = m(m + 1)(2m + 1)/6; this can be proven by induction. The variance of X is equal to σ2 = E(X2)− [E(X)]2 = (m + 1)(2m + 1) 6 − ( m + 1 2 )2 = m2 − 1 12 . ¤ Exercise: Find σ2 = V (Y ) in Examples 3.5 and 3.8 (notes). IMPORTANT RESULT : Let Y be a random variable (not necessarily a discrete random variable). Suppose that a and b are fixed constants. Then V (a + bY ) = b2V (Y ). REMARK : Taking b = 0 above, we see that V (a) = 0, for any constant a. This makes sense intuitively. The variance is a measure of variability for a random variable; a constant (such as a) does not vary. Also, by taking a = 0, we see that V (bY ) = b2V (Y ). 3.5 Moment generating functions TERMINOLOGY : Let Y be a discrete random variable with pmf pY (y) and support R. The moment generating function (mgf) for Y , denoted by mY (t), is given by mY (t) = E(e tY ) = ∑ y∈R etypY (y), provided E(etY ) < ∞ for all t in an open neighborhood about 0; i.e., there exists some h > 0 such that E(etY ) < ∞ for all t ∈ (−h, h). If E(etY ) does not exist in an open neighborhood of 0, we say that the moment generating function does not exist. TERMINOLOGY : We call µ′k ≡ E(Y k) the kth moment of the random variable Y : E(Y ) 1st moment (mean!) E(Y 2) 2nd moment E(Y 3) 3rd moment E(Y 4) 4th moment ... ... PAGE 37 CHAPTER 3 STAT/MATH 511, J. TEBBS REMARK : The moment generating function (mgf) can be used to generate moments. In fact, from the theory of Laplace transforms, it follows that if the mgf exists, it char- acterizes an infinite set of moments. So, how do we generate moments? RESULT : Let Y denote a random variable (not necessarily a discrete random variable) with support R and mgf mY (t). Then, E(Y k) = dkmY (t) dtk ∣∣∣∣∣ t=0 . Note that derivatives are taken with respect to t. Proof. Assume, without loss, that Y is discrete. With k = 1, we have d dt mY (t) = d dt ∑ y∈R etypY (y) = ∑ y∈R d dt etypY (y) = ∑ y∈R yetypY (y) = E(Y e tY ). Thus, dmY (t) dt ∣∣∣∣∣ t=0 = E(Y etY ) ∣∣∣ t=0 = E(Y ). Continuing to take higher-order derivatives, we can prove that dkmY (t) dtk ∣∣∣∣∣ t=0 = E(Y k), for any integer k ≥ 1. See pp 139-140 (WMS) for a slightly different proof. ¤ ASIDE : In the proof of the last result, we interchanged the derivative and (possibly infinite) sum. This is permitted as long as mY (t) = E(e tY ) exists. MEANS AND VARIANCES : Suppose that Y is a random variable (not necessarily a discrete random variable) with mgf mY (t). We know that E(Y ) = dmY (t) dt ∣∣∣∣∣ t=0 and E(Y 2) = d2mY (t) dt2 ∣∣∣∣∣ t=0 . We can get V (Y ) using V (Y ) = E(Y 2)− [E(Y )]2. PAGE 38 CHAPTER 3 STAT/MATH 511, J. TEBBS 3.6 Binomial distribution BERNOULLI TRIALS : Many processes can be envisioned as consisting of a sequence of “trials,” where (i) each trial results in a “success” or a “failure,” (ii) the trials are independent, and (iii) the probability of “success,” denoted by p, 0 < p < 1, is the same on every trial. TERMINOLOGY : In a sequence of n Bernoulli trials, denote by Y the number of suc- cesses out of n (where n is fixed). We say that Y has a binomial distribution with number of trials n and success probability p. Shorthand notation is Y ∼ b(n, p). Example 3.12. Each of the following situations could be conceptualized as a binomial experiment. Are you satisfied with the Bernoulli assumptions in each instance? (a) We flip a fair coin 10 times and let Y denote the number of tails in 10 flips. Here, Y ∼ b(n = 10, p = 0.5). (b) Forty percent of all plots of land respond to a certain treatment. I have four plots to be treated. If Y is the number of plots that respond to the treatment, then Y ∼ b(n = 4, p = 0.4). (c) In rural Kenya, the prevalence rate for HIV is estimated to be around 8 percent. Let Y denote the number of HIV infecteds in a sample of 740 individuals. Here, Y ∼ b(n = 740, p = 0.08). (d) Parts produced by a certain company do not meet specifications (i.e., are defective) with probability 0.001. Let Y denote the number of defective parts in a package of 40. Then, Y ∼ b(n = 40, p = 0.001). ¤ DERIVATION : We now derive the pmf of a binomial random variable. The support of Y is R = {y : y = 0, 1, 2, ..., n}. We need to find an expression for pY (y) = P (Y = y) for each value of y ∈ R. PAGE 41 CHAPTER 3 STAT/MATH 511, J. TEBBS QUESTION : In a sequence of n trials, how can we get exactly y successes? Denoting “success” and “failure” by S and F , respectively, one possible sample point might be SSFSFSFFS · · ·FSF . Because the trials are independent, the probability that we get a particular ordering of y successes and n− y failures is py(1− p)n−y. Furthermore, there are (n y ) sample points that contain exactly y successes. Thus, we add the term py(1− p)n−y a total of (n y ) times to get P (Y = y). The pmf for Y is, for 0 < p < 1, pY (y) =    ( n y ) py(1− p)n−y, y = 0, 1, 2, ..., n 0, otherwise. Example 3.13. In Example 3.12(b), assume that Y ∼ b(n = 4, p = 0.4). Here are the probability calculations for this binomial model: P (Y = 0) = pY (0) = ( 4 0 ) (0.4)0(1− 0.4)4−0 = 1× (0.4)0 × (0.6)4 = 0.1296 P (Y = 1) = pY (1) = ( 4 1 ) (0.4)1(1− 0.4)4−1 = 4× (0.4)1 × (0.6)3 = 0.3456 P (Y = 2) = pY (2) = ( 4 2 ) (0.4)2(1− 0.4)4−2 = 6× (0.4)2 × (0.6)2 = 0.3456 P (Y = 3) = pY (3) = ( 4 3 ) (0.4)3(1− 0.4)4−3 = 4× (0.4)3 × (0.6)1 = 0.1536 P (Y = 4) = pY (4) = ( 4 4 ) (0.4)4(1− 0.4)4−4 = 1× (0.4)4 × (0.6)0 = 0.0256. Exercise: What is the probability that at least 2 plots respond? at most one? What are E(Y ) and V (Y )? ¤ Example 3.14. In a small clinical trial with 20 patients, let Y denote the number of patients that respond to a new skin rash treatment. The physicians assume that a binomial model is appropriate and that Y ∼ b(n = 20, p = 0.4). Under this model, compute (a) P (Y = 5), (b) P (Y ≥ 5), and (c) P (Y < 10). (a) P (Y = 5) = pY (5) = ( 20 5 ) (0.4)5(0.6)20−5 = 0.0746. (b) P (Y ≥ 5) = 20∑ y=5 P (Y = y) = 20∑ y=5 ( 20 y ) (0.4)y(0.6)20−y. PAGE 42 CHAPTER 3 STAT/MATH 511, J. TEBBS 0 5 10 15 20 y 0.00 0.05 0.10 0.15 p( y) =P (Y =y ) Figure 3.2: Probability histogram for the number of patients responding to treatment. This represents the b(n = 20, p = 0.4) model in Example 3.14. This calculation involves using the binomial pmf 16 times and adding the results! Trick: Instead of computing the sum ∑20 y=5 ( 20 y ) (0.4)y(0.6)20−y directly, we can write P (Y ≥ 5) = 1− P (Y ≤ 4), by the complement rule. We do this because WMS’s Appendix III (Table 1, pp 839-841) contains binomial probability calculations of the form P (Y ≤ a) = a∑ y=0 ( n y ) py(1− p)n−y, for different n and p. With n = 20 and p = 0.4, we see from Table 1 that P (Y ≤ 4) = 0.051. Thus, P (Y ≥ 5) = 1− 0.051 = 0.949. (c) P (Y < 10) = P (Y ≤ 9) = 0.755, from Table 1. ¤ PAGE 43 CHAPTER 3 STAT/MATH 511, J. TEBBS GEOMETRIC PMF : The pmf for Y ∼ geom(p) is given by pY (y) =    (1− p)y−1p, y = 1, 2, 3, ... 0, otherwise. RATIONALE : The form of this pmf makes intuitive sense; we first need y − 1 failures (each of which occurs with probability 1 − p), and then a success on the yth trial (this occurs with probability p). By independence, we multiply (1− p)× (1− p)× · · · × (1− p)︸ ︷︷ ︸ y−1 failures ×p = (1− p)y−1p. NOTE : Clearly pY (y) > 0 for all y. Does pY (y) sum to one? Note that ∞∑ y=1 (1− p)y−1p = p ∞∑ x=0 (1− p)x = p 1− (1− p) = 1. In the last step, we realized that ∑∞ x=0(1−p)x is an infinite geometric sum with common ratio 1− p. ¤ Example 3.16. Biology students are checking the eye color of fruit flies. For each fly, the probability of observing white eyes is p = 0.25. What is the probability the first white-eyed fly will be observed among the first five flies that are checked? Solution: Let Y denote the number of flies needed to observe the first white-eyed fly. We can envision each fly as a Bernoulli trial (each fly either has white eyes or not). If we assume that the flies are independent, then a geometric model is appropriate; i.e., Y ∼ geom(p = 0.25). We want to compute P (Y ≤ 5). We use the pmf to compute P (Y = 1) = pY (1) = (1− 0.25)1−1(0.25) = 0.25 P (Y = 2) = pY (2) = (1− 0.25)2−1(0.25) ≈ 0.19 P (Y = 3) = pY (3) = (1− 0.25)3−1(0.25) ≈ 0.14 P (Y = 4) = pY (4) = (1− 0.25)4−1(0.25) ≈ 0.11 P (Y = 5) = pY (5) = (1− 0.25)5−1(0.25) ≈ 0.08. Adding these probabilities, we get P (Y ≤ 5) ≈ 0.77. The pmf for the geom(p = 0.25) model is depicted in Figure 3.3. ¤ PAGE 46 CHAPTER 3 STAT/MATH 511, J. TEBBS 0 5 10 15 20 y 0.00 0.05 0.10 0.15 0.20 0.25 p( y) =P (Y =y ) Figure 3.3: Probability histogram for the number of flies needed to find the first white-eyed fly. This represents the geom(p = 0.25) model in Example 3.16. GEOMETRIC MGF : Suppose that Y ∼ geom(p). The mgf of Y is given by mY (t) = pet 1− qet , where q = 1− p, for t < − ln q. Proof. Exercise. ¤ MEAN AND VARIANCE : Differentiating the mgf, we get d dt mY (t) = d dt ( pet 1− qet ) = pet(1− qet)− pet(−qet) (1− qet)2 . Thus, E(Y ) = d dt mY (t) ∣∣∣∣ t=0 = pe0(1− qe0)− pe0(−qe0) (1− qe0)2 = p(1− q)− p(−q) (1− q)2 = 1 p . Similar (but lengthier) calculations show E(Y 2) = d2 dt2 mY (t) ∣∣∣∣ t=0 = 1 + q p2 . PAGE 47 CHAPTER 3 STAT/MATH 511, J. TEBBS Finally, V (Y ) = E(Y 2)− [E(Y )]2 = 1 + q p2 − ( 1 p )2 = q p2 . ¤ NOTE : WMS derive the geometric mean and variance using a different approach (not using the mgf). See pp 116-117. ¤ Example 3.17. At an orchard in Maine, “20-lb” bags of apples are weighed. Suppose that four percent of the bags are underweight and that each bag weighed is independent. If Y denotes the the number of bags observed to find the first underweight bag, then Y ∼ geom(p = 0.04). The mean of Y is E(Y ) = 1 p = 1 0.04 = 25 bags. The variance of Y is V (Y ) = q p2 = 0.96 (0.04)2 = 600 (bags)2. ¤ 3.8 Negative binomial distribution NOTE : The negative binomial distribution can be motivated from two perspectives: • as a generalization of the geometric • as an “inverse” version of the binomial. TERMINOLOGY : Imagine an experiment where Bernoulli trials are observed. If Y denotes the trial on which the rth success occurs, r ≥ 1, then Y has a negative binomial distribution with waiting parameter r and probability of success p. NEGATIVE BINOMIAL PMF : The pmf for Y ∼ nib(r, p) is given by pY (y) =    ( y−1 r−1 ) pr(1− p)y−r, y = r, r + 1, r + 2, ... 0, otherwise. Of course, when r = 1, the nib(r, p) pmf reduces to the geom(p) pmf. PAGE 48 CHAPTER 3 STAT/MATH 511, J. TEBBS Now that we are finished with the lemma, let’s find the mgf of Y ∼ nib(r, p). With q = 1− p, we have mY (t) = E(e tY ) = ∞∑ y=r ety ( y − 1 r − 1 ) prqy−r = ∞∑ y=r et(y−r)etr ( y − 1 r − 1 ) prqy−r = (pet)r ∞∑ y=r ( y − 1 r − 1 ) (qet)y−r = (pet)r(1− qet)−r. ¤ REMARK : Showing that the nib(r, p) pmf sums to one can be done by using a similar series expansion as above. We omit it for brevity. MEAN AND VARIANCE : For Y ∼ nib(r, p), with q = 1− p, E(Y ) = r p and V (Y ) = rq p2 . 3.9 Hypergeometric distribution SETTING : Consider a collection of N objects (e.g., people, poker chips, plots of land, etc.) and suppose that we have two dichotomous classes, Class 1 and Class 2. For example, the objects and classes might be Poker chips red/blue People infected/not infected Plots of land respond to treatment/not. From the collection of N objects, we sample n of them (without replacement), and record Y , the number of objects in Class 1. REMARK : This sounds like a binomial setup! However, the difference here is that N , the population size, is finite (the population size, theoretically, is assumed to be infinite in the binomial model). Thus, if we sample from a population of objects without replace- ment, the “success” probability changes from trial to trial. This, violates the binomial PAGE 51 CHAPTER 3 STAT/MATH 511, J. TEBBS model assumptions! If N is large (i.e., in a very large population), the hypergeometric and binomial models will be similar, because the change in the probability of success from trial to trial will be small (maybe so small that it is not of practical concern). HYPERGEOMETRIC DISTRIBUTION : Envision a collection of n objects sampled (at random and without replacement) from a population of size N , where r denotes the size of Class 1 and N − r denotes the size of Class 2. Let Y denote the number of objects in the sample that belong to Class 1. Then, Y has a hypergeometric distribution, written Y ∼ hyper(N, n, r), where N = total number of objects r = number of the 1st class (e.g., “success”) N − r = number of the 2nd class (e.g., “failure”) n = number of objects sampled. HYPERGEOMETRIC PMF : The pmf for Y ∼ hyper(N, n, r) is given by pY (y) =    (ry)( N−r n−y) (Nn) , y ∈ R 0, otherwise, where the support set R = {y ∈ N : max(0, n−N + r) ≤ y ≤ min(n, r)}. BREAKDOWN : In the hyper(N, n, r) pmf, we have three parts: ( r y ) = number of ways to choose y Class 1 objects from r ( N−r n−y ) = number of ways to choose n− y Class 2 objects from N − r ( N n ) = number of sample points. REMARK : The hypergeometric pmf pY (y) does sum to 1 over the support R, but we omit this proof for brevity (see Exercise 3.216, pp 156, WMS). Example 3.19. In my fish tank at home, there are 50 fish. Ten have been tagged. If I catch 7 fish (and random, and without replacement), what is the probability that exactly two are tagged? Solution. Here, N = 50 (total number of fish), n = 7 (sample size), r = 10 (tagged PAGE 52 CHAPTER 3 STAT/MATH 511, J. TEBBS fish; Class 1), N − r = 40 (untagged fish; Class 2), and y = 2 (number of tagged fish caught). Thus, P (Y = 2) = pY (2) = ( 10 2 )( 40 5 ) ( 50 7 ) = 0.2964. What about the probability that my catch contains at most two tagged fish? Solution. Here, we want P (Y ≤ 2) = P (Y = 0) + P (Y = 1) + P (Y = 2) = ( 10 0 )( 40 7 ) ( 50 7 ) + ( 10 1 )( 40 6 ) ( 50 7 ) + ( 10 2 )( 40 5 ) ( 50 7 ) = 0.1867 + 0.3843 + 0.2964 = 0.8674. ¤ Example 3.20. A supplier ships parts to a company in lots of 25 parts. The company has an acceptance sampling plan which adopts the following acceptance rule: “....sample 5 parts at random and without replacement. If there are no de- fectives in the sample, accept the entire lot; otherwise, reject the entire lot.” Let Y denote the number of defectives in the sample. Then, Y ∼ hyper(25, 5, r), where r denotes the number defectives in the lot (in real life, r would be unknown). Define OC(p) = P (Y = 0) = ( r 0 )( 25−r 5 ) ( 25 5 ) , where p = r/25 denotes the true proportion of defectives in the lot. The symbol OC(p) denotes the probability of accepting the lot (which is a function of p). Consider the following table, whose entries are computed using the above probability expression: r p OC(p) 0 0 1.00 1 0.04 0.80 2 0.08 0.63 3 0.12 0.50 4 0.16 0.38 5 0.20 0.29 10 0.40 0.06 15 0.60 0.01 PAGE 53 CHAPTER 3 STAT/MATH 511, J. TEBBS • By Property (2), we know that the probability of one event in any one subinterval is proportional to the subinterval’s length, say λ/n, where λ is the proportionality constant. • By Property (3), the probability of more than one occurrence in any subinterval is zero (for n large). • Consider the occurrence/non-occurrence of an event in each subinterval as a Bernoulli trial. Then, by Property (1), we have a sequence of n Bernoulli tri- als, each with probability of “success” p = λ/n. Thus, a binomial (approximate) calculation gives P (Y = y) ≈ ( n y )( λ n )y ( 1− λ n )n−y . To improve the approximation for P (Y = y), we let n get large without bound. Then, lim n→∞ P (Y = y) = lim n→∞ ( n y )( λ n )y ( 1− λ n )n−y = lim n→∞ n! y!(n− y)!λ y ( 1 n )y ( 1− λ n )n ( 1 1− λ n )y = lim n→∞ n(n− 1) · · · (n− y + 1) ny︸ ︷︷ ︸ an λy y!︸︷︷︸ bn ( 1− λ n )n ︸ ︷︷ ︸ cn ( 1 1− λ n )y ︸ ︷︷ ︸ dn . Now, the limit of the product is the product of the limits: lim n→∞ an = lim n→∞ n(n− 1) · · · (n− y + 1) ny = 1 lim n→∞ bn = lim n→∞ λy y! = λy y! lim n→∞ cn = lim n→∞ ( 1− λ n )n = e−λ lim n→∞ dn = lim n→∞ ( 1 1− λ n )y = 1. We have shown that lim n→∞ P (Y = y) = λye−λ y! . PAGE 56 CHAPTER 3 STAT/MATH 511, J. TEBBS POISSON PMF : A discrete random variable Y is said to follow a Poisson distribution with rate λ if the pmf of Y is given by pY (y) =    λye−λ y! , y = 0, 1, 2, ... 0, otherwise. We write Y ∼ Poisson(λ). NOTE : Clearly pY (y) > 0 for all y ∈ R. That pY (y) sums to one is easily seen as ∑ y∈R pY (y) = ∞∑ y=0 λye−λ y! = e−λ ∞∑ y=0 λy y! = e−λeλ = 1, since ∑∞ y=0 λ y/y! is the McLaurin series expansion of eλ. ¤ EXAMPLES : Discrete random variables that might be modeled using a Poisson distri- bution include (1) the number of customers entering a post office in a given day. (2) the number of α-particles discharged from a radioactive substance in one second. (3) the number of machine breakdowns per month. (4) the number of blemishes on a piece of artificial turf. (5) the number of chocolate chips in a Chips-Ahoy cookie. Example 3.22. The number of cars Y abandoned weekly on a highway is modeled using a Poisson distribution with λ = 2.2. In a given week, what is the probability that (a) no cars are abandoned? (b) exactly one car is abandoned? (c) at most one car is abandoned? (d) at least one car is abandoned? PAGE 57 CHAPTER 3 STAT/MATH 511, J. TEBBS Solutions. We have Y ∼ Poisson(λ = 2.2). (a) P (Y = 0) = pY (0) = (2.2)0e−2.2 0! = e−2.2 = 0.1108 (b) P (Y = 1) = pY (1) = (2.2)1e−2.2 1! = 2.2e−2.2 = 0.2438 (c) P (Y ≤ 1) = P (Y = 0) + P (Y = 1) = pY (0) + pY (1) = 0.1108 + 0.2438 = 0.3456 (d) P (Y ≥ 1) = 1− P (Y = 0) = 1− pY (0) = 1− 0.1108 = 0.8892. ¤ 0 2 4 6 8 10 12 y 0.00 0.05 0.10 0.15 0.20 0.25 p( y) =P (Y =y ) Figure 3.4: Probability histogram for the number of abandoned cars. This represents the Poisson(λ = 2.2) model in Example 3.22. REMARK : WMS’s Appendix III, (Table 3, pp 843-847) includes an impressive table for Poisson probabilities of the form FY (a) = P (Y ≤ a) = a∑ y=0 λye−λ y! . Recall that this function is called the cumulative distribution function of Y . This makes computing compound event probabilities much easier. PAGE 58 CHAPTER 3 STAT/MATH 511, J. TEBBS RELATIONSHIP : Suppose that Y ∼ b(n, p). If n is large and p is small, then pY (y) = ( n y ) py(1− p)n−y ≈ λ ye−λ y! , for y ∈ R = {0, 1, 2, ..., n}, where λ = np. Example 3.25. Hepatitis C (HCV) is a viral infection that causes cirrhosis and cancer of the liver. Since HCV is transmitted through contact with infectious blood, screening donors is important to prevent further transmission. The World Health Organization has projected that HCV will be a major burden on the US health care system before the year 2020. For public-health reasons, researchers take a sample of n = 1875 blood donors and screen each individual for HCV. If 3 percent of the entire population is infected, what is the probability that 50 or more are HCV-positive? Solution. Let Y denote the number of HCV-infected individuals in our sample. We compute the probability P (Y ≥ 50) using both the binomial and Poisson models. • Binomial: Here, n = 1875 and p = 0.03. Thus, P (Y ≥ 50) = 1875∑ y=50 ( 1875 y ) (0.03)y(0.97)1875−y ≈ 0.818783. • Poisson: Here, λ = np = 1875(0.03) ≈ 56.25. Thus, P (Y ≥ 50) = ∞∑ y=50 (56.25)ye−56.25 y! ≈ 0.814932. As we can see, the Poisson approximation is quite good. ¤ RELATIONSHIP : One can see that the hypergeometric, binomial, and Poisson models are related in the following way: hyper(N,n, r) ←→ b(n, p) ←→ Poisson(λ). The first link results when N is large and r/N → p. The second link results when n is large and p is small so that λ/n → p. When these situations are combined, as you might suspect, one can approximate the hypergeometric model with a Poisson model! PAGE 61 CHAPTER 4 STAT/MATH 511, J. TEBBS 4 Continuous Distributions Complementary reading from WMS: Chapter 4. 4.1 Introduction RECALL: In Chapter 3, we focused on discrete random variables. A discrete random variable Y can assume a finite or (at most) a countable number of values. We also learned about probability mass functions (pmfs). These functions tell us what probabilities to assign to each of the support points in R (a countable set). PREVIEW : Continuous random variables have support sets that are not countable. In fact, most often, the support set for a continuous random variable Y is an interval of real numbers; e.g., R = {y : 0 ≤ y ≤ 1}, R = {y : 0 < y < ∞}, R = {y : −∞ < y < ∞}, etc. Thus, probabilities of events involving continuous random variables must be assigned in a different way. 4.2 Cumulative distribution functions TERMINOLOGY : The (cumulative) distribution function (cdf) of a random vari- able Y , denoted by FY (y), is given by the probability FY (y) = P (Y ≤ y), for all −∞ < y < ∞. Note that the cdf is defined for all y ∈ R (the set of all real numbers), not just for those values of y ∈ R (the support of Y ). Every random variable, discrete or continuous, has a cdf. Example 4.1. Suppose that the random variable Y has pmf pY (y) =    1 6 (3− y), y = 0, 1, 2 0, otherwise. PAGE 62 CHAPTER 4 STAT/MATH 511, J. TEBBS We now compute probabilities of the form P (Y ≤ y): • for y < 0, FY (y) = P (Y ≤ y) = 0 • for 0 ≤ y < 1, FY (y) = P (Y ≤ y) = P (Y = 0) = 36 • for 1 ≤ y < 2, FY (y) = P (Y ≤ y) = P (Y = 0) + P (Y = 1) = 36 + 26 = 56 • for y ≥ 2, FY (y) = P (Y ≤ y) = P (Y = 0)+P (Y = 1)+P (Y = 2) = 36 + 26 + 16 = 1. Putting this all together, we have the cdf for Y : FY (y) =    0, y < 0 3 6 , 0 ≤ y < 1 5 6 , 1 ≤ y < 2 1, y ≥ 2. It is instructive to plot the pmf of Y and the cdf of Y side by side. 0 1 2 y 0.0 0.1 0.2 0.3 0.4 0.5 p (y ) −1 0 1 2 3 0. 0 0. 2 0. 4 0. 6 0. 8 1. 0 y F (y ) pmf, pY (y) cdf, FY (y) Figure 4.5: Probability mass function pY (y) and cumulative distribution function FY (y) in Example 4.1. • PMF – The height of the bar above y is the probability that Y assumes that value. – For any y not equal to 0, 1, or 2, pY (y) = 0. PAGE 63 CHAPTER 4 STAT/MATH 511, J. TEBBS 4 6 8 10 12 0 .0 0 0 .0 5 0 .1 0 0 .1 5 0 .2 0 0 .2 5 0 .3 0 Male birthweights (in lbs.) 4 6 8 10 12 0 .0 0 0 .0 5 0 .1 0 0 .1 5 0 .2 0 0 .2 5 0 .3 0 Male birthweights (in lbs.) Figure 4.6: Canadian male birth weight data. The histogram (left) is constructed from a sample of n = 1250 subjects. A normal probability density function has been fit to the empirical distribution (right). Example 4.2. A team of Montreal researchers who studied the birth weights of five million Canadian babies born between 1981 and 2003 say environmental contaminants may be to blame for a drop in the size of newborn baby boys. A subset (n = 1250 subjects) of the birth weights, measured in lbs, is given in Figure 4.6. ¤ IMPORTANT : Suppose Y is a continuous random variable with pdf fY (y) and cdf FY (y). The probability of an event {Y ∈ B} is computed by integrating fY (y) over B, that is, P (Y ∈ B) = ∫ B fY (y)dy, for any B ⊂ R. If B = {y : a ≤ y ≤ b}; i.e., B = [a, b], then P (Y ∈ B) = P (a ≤ Y ≤ b) = ∫ b a fY (y)dy = ∫ b −∞ fY (y)dy − ∫ a −∞ fY (y)dy = FY (b)− FY (a). Compare these to the analogous results for the discrete case (see page 29 in the notes). In the continuous case, fY (y) replaces pY (y) and integrals replace sums. PAGE 66 CHAPTER 4 STAT/MATH 511, J. TEBBS RECALL: We have already discovered that if Y is a continuous random variable, then P (Y = a) = 0 for any constant a. This can be also seen by writing P (Y = a) = P (a ≤ Y ≤ a) = ∫ a a fY (y)dy = 0, where fY (y) is the pdf of Y . An immediate consequence of this is that if Y is continuous, P (a ≤ Y ≤ b) = P (a ≤ Y < b) = P (a < Y ≤ b) = P (a < Y < b) = ∫ b a fY (y)dy. Example 4.3. Suppose that Y has the pdf fY (y) =    2y, 0 < y < 1 0, otherwise. Find the cdf of Y . Solution. We need to compute FY (y) = P (Y ≤ y) for all y ∈ R. There are three cases to consider: • when y ≤ 0, FY (y) = ∫ y −∞ fY (t)dt = ∫ y −∞ 0dt = 0; • when 0 < y < 1, FY (y) = ∫ y −∞ fY (t)dt = ∫ 0 −∞ 0dt + ∫ y 0 2tdt = 0 + t2 ∣∣∣ y 0 = y2; • when y ≥ 1, FY (y) = ∫ y −∞ fY (t)dt = ∫ 0 −∞ 0dt + ∫ 1 0 2tdt + ∫ y 1 0dt = 0 + 1 + 0 = 1. Putting this all together, we have FY (y) =    0, y < 0 y2, 0 ≤ y < 1 1, y ≥ 1. The pdf fY (y) and the cdf FY (y) are plotted side by side in Figure 4.7. Exercise: Find (a) P (0.3 < Y < 0.7), (b) P (Y = 0.3), and (c) P (Y > 0.7). ¤ PAGE 67 CHAPTER 4 STAT/MATH 511, J. TEBBS 0.0 0.2 0.4 0.6 0.8 1.0 0. 0 0. 5 1. 0 1. 5 2. 0 y f(y ) −0.5 0.0 0.5 1.0 1.5 0. 0 0. 2 0. 4 0. 6 0. 8 1. 0 y F( y) pdf, fY (y) cdf, FY (y) Figure 4.7: Probability density function fY (y) and cumulative distribution function FY (y) in Example 4.3. Example 4.4. From the onset of infection, the survival time Y (measured in years) of patients with chronic active hepatitis receiving prednisolone is modeled with the pdf fY (y) =    1 10 e−y/10, y > 0 0, otherwise. Find the cdf of Y . Solution. We need to compute FY (y) = P (Y ≤ y) for all y ∈ R. There are two cases to consider: • when y ≤ 0, FY (y) = ∫ y −∞ fY (t)dt = ∫ y −∞ 0dt = 0; • when y > 0, FY (y) = ∫ y −∞ fY (t)dt = ∫ 0 −∞ 0dt + ∫ y 0 1 10 e−t/10dt = 0 + 1 10 (−10e−t/10) ∣∣∣∣∣ y 0 = 1− e−y/10. PAGE 68 CHAPTER 4 STAT/MATH 511, J. TEBBS Mathematically, we require that ∫ R |y|fY (y)dy < ∞. If this is not true, we say that E(Y ) does not exist. If g is a real-valued function, then g(Y ) is a random variable and E[g(Y )] = ∫ R g(y)fY (y)dy, provided that this integral exists. Example 4.6. Suppose that Y has pdf given by fY (y) =    2y, 0 < y < 1 0, otherwise. Find E(Y ), E(Y 2), and E(ln Y ). Solution. The expected value of Y is given by E(Y ) = ∫ 1 0 yfY (y)dy = ∫ 1 0 y(2y)dy = ∫ 1 0 2y2dy = 2 ( y3 3 ∣∣∣∣ 1 0 ) = 2 ( 1 3 − 0 ) = 2/3. The second moment is E(Y 2) = ∫ 1 0 y2fY (y)dy = ∫ 1 0 y2(2y)dy = ∫ 1 0 2y3dy = 2 ( y4 4 ∣∣∣∣ 1 0 ) = 2 ( 1 4 − 0 ) = 1/2. Finally, E(ln Y ) = ∫ 1 0 ln y(2y)dy. To solve this integral, use integration by parts with u = ln y and dv = 2ydy: E(ln Y ) = y2 ln y ∣∣∣∣ 1 0︸ ︷︷ ︸ = 0 − ∫ 1 0 ydy = − ( y2 2 ∣∣∣∣ 1 0 ) = −1 2 . ¤ PAGE 71 CHAPTER 4 STAT/MATH 511, J. TEBBS PROPERTIES OF EXPECTATIONS : Let Y be a continuous random variable with pdf fY (y) and support R, suppose that g, g1, g2, ..., gk are real-valued functions, and let c be any real constant. Then, (a) E(c) = c (b) E[cg(Y )] = cE[g(Y )] (c) E[ ∑k j=1 gj(Y )] = ∑k j=1 E[gj(Y )]. These properties are identical to those we discussed in the discrete case. 4.4.2 Variance TERMINOLOGY : Let Y be a continuous random variable with pdf fY (y), support R, and mean E(Y ) = µ. The variance of Y is given by σ2 ≡ V (Y ) ≡ E[(Y − µ)2] = ∫ R (y − µ)2fY (y)dy. The variance computing formula still applies in the continuous case, that is, V (Y ) = E(Y 2)− [E(Y )]2. Example 4.7. Suppose that Y has pdf given by fY (y) =    2y, 0 < y < 1 0, otherwise. Find σ2 = V (Y ). Solution. We computed E(Y ) = µ = 2/3 in Example 4.6. Using the definition above, V (Y ) = ∫ 1 0 ( y − 2 3 )2 (2y)dy. Instead of doing this integral, it is easier to use the variance computing formula V (Y ) = E(Y 2)− [E(Y )]2. In Example 4.6, we computed the second moment E(Y 2) = 1/2. Thus, V (Y ) = E(Y 2)− [E(Y )]2 = 1 2 − ( 2 3 )2 = 1/18. ¤ PAGE 72 CHAPTER 4 STAT/MATH 511, J. TEBBS 4.4.3 Moment generating functions TERMINOLOGY : Let Y be a continuous random variable with pdf fY (y) and support R. The moment generating function (mgf) for Y , denoted by mY (t), is given by mY (t) = E(e tY ) = ∫ R etyfY (y)dy, provided E(etY ) < ∞ for all t in an open neighborhood about 0; i.e., there exists some h > 0 such that E(etY ) < ∞ for all t ∈ (−h, h). If E(etY ) does not exist in an open neighborhood of 0, we say that the moment generating function does not exist. Example 4.8. Suppose that the pdf of Y is given by fY (y) =    e−y, y > 0 0, otherwise. Find the mgf of Y and use it to compute E(Y ) and V (Y ). Solution. mY (t) = E(e tY ) = ∫ ∞ 0 etyfY (y)dy = ∫ ∞ 0 etye−ydy = ∫ ∞ 0 ety−ydy = ∫ ∞ 0 e−y(1−t)dy = − ( 1 1− t ) e−y(1−t) ∣∣∣∣∣ ∞ y=0 . In the last expression, note that lim y→∞ e−y(1−t) < ∞ if and only if 1− t > 0, i.e., t < 1. Thus, for t < 1, we have mY (t) = − ( 1 1− t ) e−y(1−t) ∣∣∣∣∣ ∞ y=0 = 0 + ( 1 1− t ) = 1 1− t . Note that (−h, h) with h = 1 is an open neighborhood around zero for which mY (t) exists. With the mgf, we can calculate the mean and variance. Differentiating the mgf, PAGE 73 CHAPTER 4 STAT/MATH 511, J. TEBBS 0.0 0.2 0.4 0.6 0.8 1.0 0. 6 0. 8 1. 0 1. 2 1. 4 y f(y ) −0.5 0.0 0.5 1.0 1.5 0. 0 0. 2 0. 4 0. 6 0. 8 1. 0 y F( y) pdf, fY (y) cdf, FY (y) Figure 4.9: The U(0, 1) probability density function and cumulative distribution function. 4.6 Normal distribution TERMINOLOGY : A random variable Y is said to have a normal distribution if its pdf is given by fY (y) =    1√ 2πσ e− 1 2 ( y−µ σ )2 , −∞ < y < ∞ 0, otherwise. Shorthand notation is Y ∼ N (µ, σ2). There are two parameters in the normal distribu- tion: the mean E(Y ) = µ and the variance V (Y ) = σ2. FACTS : (a) The N (µ, σ2) pdf is symmetric about µ; that is, for any a ∈ R, fY (µ− a) = fY (µ + a). (b) The N (µ, σ2) pdf has points of inflection located at y = µ± σ (verify!). (c) limy→±∞ fY (y) = 0. PAGE 76 CHAPTER 4 STAT/MATH 511, J. TEBBS TERMINOLOGY : A normal distribution with mean µ = 0 and variance σ2 = 1 is called the standard normal distribution. It is conventional to let Z denote a random variable that follows a standard normal distribution; we write Z ∼ N (0, 1). IMPORTANT : Tabled values of the standard normal probabilities are given in Appendix III (Table 4, pp 848) of WMS. This table turns out to be helpful since the integral FY (y) = P (Y ≤ y) = ∫ y −∞ 1√ 2πσ e− 1 2 ( t−µ σ )2 dt does not exist in closed form. Specifically, the table provides values of 1− FZ(z) = P (Z > z) = ∫ ∞ z fZ(u)du, where fZ(u) denotes the nonzero part of the standard normal pdf; i.e., fZ(u) = 1√ 2π e−u 2/2. To use the table, we need to first prove that any N (µ, σ2) distribution can be “trans- formed” to the (standard) N (0, 1) distribution (we’ll see how to do this later). Once we do this, we will see that there is only a need for one table of probabilities. Of course, probabilities like FY (y) = P (Y ≤ y) can be obtained using software too. Example 4.10. Show that the N (µ, σ2) pdf integrates to 1. Proof. Let z = (y − µ)/σ so that dz = dy/σ and dy = σdz. Define I = ∫ ∞ −∞ 1√ 2πσ e− 1 2 ( y−µ σ )2 dy = ∫ ∞ −∞ 1√ 2π e−z 2/2dz. We want to show that I = 1. Since I > 0, it suffices to show that I2 = 1. Note that I2 = ∫ ∞ −∞ 1√ 2π e−x 2/2dx ∫ ∞ −∞ 1√ 2π e−y 2/2dy = 1 2π ∫ ∞ −∞ ∫ ∞ −∞ exp [ − ( x2 + y2 2 )] dxdy. Switching to polar coordinates; i.e., letting x = r cos θ and y = r sin θ, we get x2 + y2 = r2(cos2 θ + sin2 θ) = r2, and dxdy = rdrdθ; i.e., the Jacobian of the transformation from PAGE 77 CHAPTER 4 STAT/MATH 511, J. TEBBS (x, y) space to (r, θ) space. Thus, we write I2 = 1 2π ∫ 2π θ=0 ∫ ∞ r=0 e−r 2/2rdrdθ = 1 2π ∫ 2π θ=0 [∫ ∞ r=0 re−r 2/2dr ] dθ = 1 2π ∫ 2π θ=0 [ − e−r2/2 ∣∣∣∣ ∞ r=0 ] dθ = 1 2π ∫ 2π θ=0 1dθ = θ 2π ∣∣∣ 2π θ=0 = 1. ¤ NORMAL MGF : Suppose that Y ∼ N (µ, σ2). The mgf of Y is mY (t) = exp ( µt + σ2t2 2 ) . Proof. Using the definition of the mgf, we have mY (t) = E(e tY ) = ∫ ∞ −∞ ety 1√ 2πσ e− 1 2 ( y−µ σ )2 dy = 1√ 2πσ ∫ ∞ −∞ ety− 1 2 ( y−µ σ )2 dy. Define b = ty − 1 2 ( y−µ σ )2 , the exponent in the last integral. We are going to rewrite b in the following way: b = ty − 1 2 ( y − µ σ )2 = ty − 1 2σ2 (y2 − 2µy + µ2) = − 1 2σ2 (y2 − 2µy − 2σ2ty + µ2) = − 1 2σ2 [ y2 − 2(µ + σ2t)y︸ ︷︷ ︸ complete the square +µ2 ] = − 1 2σ2 [ y2 − 2(µ + σ2t)y + (µ + σ2t)2 − (µ + σ2t)2︸ ︷︷ ︸ add and subtract +µ2 ] = − 1 2σ2 { [y − (µ + σ2t)]2} + 1 2σ2 [ (µ + σ2t)2 − µ2] = − 1 2σ2 (y − a)2 + 1 2σ2 (µ2 + 2µσ2t + σ4t2 − µ2) = − 1 2σ2 (y − a)2 + µt + σ2t2/2︸ ︷︷ ︸ = c, say , PAGE 78 CHAPTER 4 STAT/MATH 511, J. TEBBS (b) For this model, ninety percent of all contamination levels are above what mercury level? Solution. We want to find φY0.10, the 10th percentile of Y ∼ N (18, 16); i.e., φY0.10 solves FY (φ Y 0.10) = P (Y ≤ φY0.10) = 0.10. We’ll start by finding φZ0.10, the 10th percentile of Z ∼ N (0, 1); i.e., φZ0.10 solves FZ(φ Z 0.10) = P (Z ≤ φZ0.10) = 0.10. From the standard normal table (Table 4), we see that φZ0.10 ≈ −1.28. We are left to solve the equation: φY0.10 − 18 4 = φZ0.10 ≈ −1.28 =⇒ φY0.10 ≈ −1.28(4) + 18 = 12.88. Thus, 90 percent of all contamination levels are greater than 12.88 parts per million. ¤ 4.7 The gamma family of distributions INTRODUCTION : In this section, we examine an important family of probability dis- tributions; namely, those in the gamma family. There are three well-known “named distributions” in this family: • the exponential distribution • the gamma distribution • the χ2 distribution. NOTE : The exponential and gamma distributions are popular models for lifetime ran- dom variables; i.e., random variables that record “time to event” measurements, such as the lifetimes of an electrical component, death times for human subjects, waiting times in Poisson processes, etc. Other lifetime distributions include the lognormal, Weibull, loggamma, among others. PAGE 81 CHAPTER 4 STAT/MATH 511, J. TEBBS 4.7.1 Exponential distribution TERMINOLOGY : A random variable Y is said to have an exponential distribution with parameter β > 0 if its pdf is given by fY (y) =    1 β e−y/β, y > 0 0, otherwise. Shorthand notation is Y ∼ exponential(β). The value of β determines the scale of the distribution, so it is called a scale parameter. Exercise: Show that the exponential pdf integrates to 1. EXPONENTIAL MGF : Suppose that Y ∼ exponential(β). The mgf of Y is given by mY (t) = 1 1− βt, for t < 1/β. Proof. From the definition of the mgf, we have mY (t) = E(e tY ) = ∫ ∞ 0 ety ( 1 β e−y/β ) dy = 1 β ∫ ∞ 0 ety−y/βdy = 1 β ∫ ∞ 0 e−y[(1/β)−t]dy = 1 β { − ( 1 1 β − t ) e−y[(1/β)−t] }∣∣∣∣∣ ∞ y=0 = ( 1 1− βt ) { e−y[(1/β)−t] ∣∣∣∣ 0 y=∞ } . In the last expression, note that lim y→∞ e−y[(1/β)−t] < ∞ if and only if (1/β)− t > 0, i.e., t < 1/β. Thus, for t < 1/β, we have mY (t) = ( 1 1− βt ) e−y[(1/β)−t] ∣∣∣∣∣ 0 y=∞ = ( 1 1− βt ) − 0 = 1 1− βt. Note that (−h, h) with h = 1/β is an open neighborhood around 0 for which mY (t) exists. ¤ PAGE 82 CHAPTER 4 STAT/MATH 511, J. TEBBS 0 500 1000 1500 2000 2500 y, component lifetimes (hours) 0. 00 00 0. 00 05 0. 00 10 0. 00 15 0. 00 20 f(y ) Figure 4.11: The probability density function, fY (y), in Example 4.12. A model for electrical component lifetimes. MEAN AND VARIANCE : Suppose that Y ∼ exponential(β). The mean and variance of Y are given by E(Y ) = β and V (Y ) = β2. Proof: Exercise. ¤ Example 4.12. The lifetime of an electrical component has an exponential distribution with mean β = 500 hours. What is the probability that a randomly selected component fails before 100 hours? lasts between 250 and 750 hours? Solution. With β = 500, the pdf for Y is given by fY (y) =    1 500 e−y/500, y > 0 0, otherwise. This pdf is depicted in Figure 4.11. Thus, the probability of failing before 100 hours is P (Y < 100) = ∫ 100 0 1 500 e−y/500dy ≈ 0.181. PAGE 83 CHAPTER 4 STAT/MATH 511, J. TEBBS 0 2 4 6 8 10 12 0.0 0.1 0.2 0.3 0.4 0.5 y f(y ) 0 2 4 6 8 10 12 0.0 0.1 0.2 0.3 y f(y ) 0 10 20 30 40 50 0.0 0 0.0 2 0.0 4 0.0 6 y f(y ) 0 10 20 30 40 50 0.0 0 0.0 2 0.0 4 0.0 6 y f(y ) Figure 4.12: Four gamma pdfs. Upper left: α = 1, β = 2. Upper right: α = 2, β = 1. Lower left: α = 3, β = 4. Lower right: α = 6, β = 3. REMARK : By changing the values of α and β, the gamma pdf can assume many shapes. This makes the gamma distribution popular for modeling lifetime data. Note that when α = 1, the gamma pdf reduces to the exponential(β) pdf. That is, the exponential pdf is a “special” gamma pdf. Example 4.14. Show that the gamma(α, β) pdf integrates to 1. Solution. Change the variable of integration to u = y/β so that du = dy/β and dy = βdu. We have ∫ ∞ 0 fY (y)dy = ∫ ∞ 0 1 Γ(α)βα yα−1e−y/βdy = 1 Γ(α) ∫ ∞ 0 1 βα (βu)α−1e−βu/ββdu = 1 Γ(α) ∫ ∞ 0 uα−1e−udu = Γ(α) Γ(α) = 1. ¤ PAGE 86 CHAPTER 4 STAT/MATH 511, J. TEBBS GAMMA MGF : Suppose that Y ∼ gamma(α, β). The mgf of Y is mY (t) = ( 1 1− βt )α , for t < 1/β. Proof. From the definition of the mgf, we have mY (t) = E(e tY ) = ∫ ∞ 0 ety [ 1 Γ(α)βα yα−1e−y/β ] dy = ∫ ∞ 0 1 Γ(α)βα yα−1ety−y/βdy = ∫ ∞ 0 1 Γ(α)βα yα−1e−y[(1/β)−t]dy = ∫ ∞ 0 1 Γ(α)βα yα−1e−y/[(1/β)−t] −1 dy = ηα βα ∫ ∞ 0 1 Γ(α)ηα yα−1e−y/ηdy, where η = [(1/β) − t]−1. If η > 0 ⇐⇒ t < 1/β, then the last integral equals 1, because the integrand is the gamma(α, η) pdf and integration is over R = {y : 0 < y < ∞}. Thus, mY (t) = ( η β )α = { 1 β[(1/β)− t] }α = ( 1 1− βt )α . Note that (−h, h) with h = 1/β is an open neighborhood around 0 for which mY (t) exists. ¤ MEAN AND VARIANCE : If Y ∼ gamma(α, β), then E(Y ) = αβ and V (Y ) = αβ2. NOTE : Upon closer inspection, we see that the nonzero part of the gamma(α, β) pdf fY (y) = 1 Γ(α)βα yα−1e−y/β consists of two parts: • the kernel of the pdf: yα−1e−y/β • a constant out front: 1/Γ(α)βα. PAGE 87 CHAPTER 4 STAT/MATH 511, J. TEBBS The kernel is the “guts” of the formula, while the constant out front is simply the “right quantity” that makes fY (y) a valid pdf; i.e., the constant which makes fY (y) integrate to 1. Note that because ∫ ∞ 0 1 Γ(α)βα yα−1e−y/βdy = 1, it follows immediately that ∫ ∞ 0 yα−1e−y/βdy = Γ(α)βα. This fact is extremely fascinating in its own right, and it is very helpful too; we will use it repeatedly. Example 4.15. Suppose that Y has pdf given by fY (y) =    cy2e−y/4, y > 0 0, otherwise. (a) What is the value of c that makes this a valid pdf? (b) What is the mgf of Y ? (c) What are the mean and variance of Y ? Solutions. Note that y2e−y/4 is a gamma kernel with α = 3 and β = 4. Thus, the constant out front is c = 1 Γ(α)βα = 1 Γ(3)43 = 1 2(64) = 1 128 . The mgf of Y is mY (t) = ( 1 1− βt )α = ( 1 1− 4t )3 , for t < 1/4. Finally, E(Y ) = αβ = 3(4) = 12 V (Y ) = αβ2 = 3(42) = 48. RELATIONSHIP WITH A POISSON PROCESS : Suppose that we are observing events according to a Poisson process with rate λ = 1/β, and let the random variable W denote the time until the αth occurrence. Then, W ∼ gamma(α, β). PAGE 88 CHAPTER 4 STAT/MATH 511, J. TEBBS 4.8 Beta distribution TERMINOLOGY : A random variable Y is said to have a beta distribution with parameters α > 0 and β > 0 if its pdf is given by fY (y) =    Γ(α+β) Γ(α)Γ(β) yα−1(1− y)β−1, 0 < y < 1 0, otherwise. Since the support of Y is R = {y : 0 < y < 1}, the beta distribution is a popular probability model for proportions. Shorthand notation is Y ∼ beta(α, β). NOTE : Upon closer inspection, we see that the nonzero part of the beta(α, β) pdf fY (y) = Γ(α + β) Γ(α)Γ(β) yα−1(1− y)β−1 consists of two parts: • the kernel of the pdf: yα−1(1− y)β−1 • a constant out front: Γ(α + β)/Γ(α)Γ(β). Again, the kernel is the “guts” of the formula, while the constant out front is simply the “right quantity” that makes fY (y) a valid pdf; i.e., the constant which makes fY (y) integrate to 1. Note that because ∫ 1 0 Γ(α + β) Γ(α)Γ(β) yα−1(1− y)β−1dy = 1, it follows immediately that ∫ 1 0 yα−1(1− y)β−1dy = Γ(α)Γ(β) Γ(α + β) . BETA PDF SHAPES : The beta pdf is very flexible. That is, by changing the values of α and β, we can come up with many different pdf shapes. See Figure 4.13 for examples. • When α = β, the pdf is symmetric about the line y = 1 2 . • When α < β, the pdf is skewed right (i.e., smaller values of y are more likely). PAGE 91 CHAPTER 4 STAT/MATH 511, J. TEBBS Beta(2,1) f(y ) 0.0 0.2 0.4 0.6 0.8 1.0 0. 0 0. 5 1. 0 1. 5 2. 0 Beta(2,2) f(y ) 0.0 0.2 0.4 0.6 0.8 1.0 0. 0 0. 5 1. 0 1. 5 Beta(3,2) f(y ) 0.0 0.2 0.4 0.6 0.8 1.0 0. 0 0. 5 1. 0 1. 5 Beta(1,14) f(y ) 0.0 0.2 0.4 0.6 0.8 1.0 0 2 4 6 8 10 12 14 Figure 4.13: Four beta pdfs. Upper left: α = 2, β = 1. Upper right: α = 2, β = 2. Lower left: α = 3, β = 2. Lower right: α = 1, β = 14. • When α > β, the pdf is skewed left (i.e., larger values of y are more likely). • When α = β = 1, the beta pdf reduces to the U(0, 1) pdf! BETA MGF : The beta(α, β) mgf exists, but not in closed form. Hence, we’ll compute moments directly. MEAN AND VARIANCE : If Y ∼ beta(α, β), then E(Y ) = α α + β and V (Y ) = αβ (α + β)2(α + β + 1) . Proof. We will derive E(Y ) only. From the definition of expected value, we have E(Y ) = ∫ 1 0 yfY (y)dy = ∫ 1 0 y [ Γ(α + β) Γ(α)Γ(β) yα−1(1− y)β−1 ] dy = Γ(α + β) Γ(α)Γ(β) ∫ 1 0 y(α+1)−1(1− y)β−1︸ ︷︷ ︸ beta(α+1,β) kernel dy. PAGE 92 CHAPTER 4 STAT/MATH 511, J. TEBBS Note that the last integrand is a beta kernel with parameters α + 1 and β. Because integration is over R = {y : 0 < y < 1}, we have ∫ 1 0 y(α+1)−1(1− y)β−1 = Γ(α + 1)Γ(β) Γ(α + 1 + β) and thus E(Y ) = Γ(α + β) Γ(α)Γ(β) Γ(α + 1)Γ(β) Γ(α + 1 + β) = Γ(α + β) Γ(α) Γ(α + 1) Γ(α + 1 + β) = Γ(α + β) Γ(α) αΓ(α) (α + β)Γ(α + β) = α α + β . To derive V (Y ), first find E(Y 2) using similar calculations. Use the variance computing formula V (Y ) = E(Y 2)− [E(Y )]2 and simplify. ¤ Example 4.17. At a health clinic, suppose that Y , the proportion of individuals infected with a new flu virus (e.g., H1N1, etc.), varies daily according to a beta distribution with pdf fY (y) =    20(1− y)19, 0 < y < 1 0, otherwise. This distribution is displayed in Figure 4.14. Questions. (a) What are the parameters in this distribution; i.e., what are α and β? (b) What is the mean proportion of individuals infected? (c) Find φ0.95, the 95th percentile of this distribution. (d) Treating daily infection counts as independent (from day to day), what is the prob- ability that during any given 5-day span, there is are at least 2 days where the infection proportion is above 10 percent? Solutions. (a) α = 1 and β = 20. (b) E(Y ) = 1/(1 + 20) ≈ 0.048. (c) The 95th percentile φ0.95 solves P (Y ≤ φ0.95) = ∫ φ0.95 0 20(1− y)19dy = 0.95. PAGE 93