Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

An overview of various probability distributions for discrete and continuous random variables, including discrete uniform, hypergeometric, binomial, geometric, negative binomial, poisson, exponential, gamma, cauchy, and normal distributions. It covers their probability density functions (pdf) and cumulative distribution functions (cdf), as well as their properties and applications.

Typology: Study notes

Pre 2010

1 / 36

Download Probability Distributions of Discrete and Continuous Random Variables and more Study notes Probability and Statistics in PDF only on Docsity! MT426 Notebook 2 prepared by Professor Jenny Baglivo c© Copyright 2009 by Jenny A. Baglivo. All Rights Reserved. 2 MT426 Notebook 2 3 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 Discrete Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2.1 Probability Distribution, PDF, CDF . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2.2 Discrete Uniform Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2.3 Hypergeometric Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2.4 Bernoulli Experiments, Bernoulli and Binomial Distributions . . . . . . . . . . . 8 2.2.5 Simple Random Samples, Binomial Approximation, Survey Analysis . . . . . . . 9 2.2.6 Geometric and Negative Binomial Distributions . . . . . . . . . . . . . . . . . . . 10 2.2.7 Poisson Limit Theorem, Poisson Distribution . . . . . . . . . . . . . . . . . . . . 12 2.3 Continuous Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3.1 Probability Distribution, PDF and CDF . . . . . . . . . . . . . . . . . . . . . . . 14 2.3.2 Quantiles, Percentiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3.3 Continuous Uniform Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3.4 Exponential Distribution, Relationship to Poisson Process . . . . . . . . . . . . . 21 2.3.5 Euler Gamma Function, Gamma Distribution . . . . . . . . . . . . . . . . . . . . 23 2.3.6 Distributions Related to Poisson Processes . . . . . . . . . . . . . . . . . . . . . 26 2.3.7 Cauchy Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.3.8 Normal (Gaussian) Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.3.9 Transforming Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 List of Tables 1 Standard normal cumulative probabilities, Φ(z) = P (Z ≤ z), when z ≥ 0. . . . . . . . . . 31 1 2 Cumulative distribution function. The cumulative distribution function (CDF) of the discrete random variable X is defined as follows: F (x) = P (X ≤ x) for all real numbers x. Cumulative distribution functions satisfy the following properties: 1. Infinite Limits: lim x→−∞ F (x) = 0 and lim x→∞ F (x) = 1. 2. Nondecreasing: If x1 ≤ x2, then F (x1) ≤ F (x2). 3. Right Continuous: lim x→a+ F (x) = F (a) for each real number a. Note that F (x) represents cumulative probability, with limits 0 and 1 (property 1). Cumulative probability increases with increasing x (property 2), and has discrete jumps at values of x in the range of the random variable (property 3). Example 1, continued. Let X be the number of heads in eight tosses of a fair coin. 1. Left Plot: The left plot is a probability histogram of the PDF of X. A probability histogram is constructed as follows: For each x in the range of X, a rectangle with base corresponding to the interval [x−0.50, x+0.50] and with height p(x) is drawn. The total area is 1. 2. Right Plot: The right plot shows the CDF of X. F (x) is a step function, with steps of height p(x) for each x in the range. Since the range of X is the finite set {0, 1, 2, . . . , 8} , F (x) = 0 when x < 0 and F (x) = 1 when x ≥ 8. 5 2.2.2 Discrete Uniform Distribution Let n be a positive integer. The random variable X is said to be a discrete uniform random variable, or to have a discrete uniform distribution, with parameter n when its PDF is as follows: p(x) = 1 n when x = 1, 2, . . . , n, and 0 otherwise. For example, suppose that you roll a fair six-sided die and let X be the number of dots on the top face. Then X is a discrete uniform random variable with n = 6. The PDF and CDF of X are displayed below: Example (Hand et al, Chapman & Hall, 1994, page 98). Is a fair die really fair? That’s the question that R.Wolf tried to answer when he rolled a “fair” die 20,000 times and recorded the number of dots on the top face each time. The following table summarizes the results of his experiment, where the expected frequency is Expected Frequency = 20000 p(x) = 20000 ( 1 6 ) ≈ 3333.33 for each x, and the relative error is the following ratio: Relative Error = Observed Frequency− Expected Frequency Expected Frequency . Note that the relative error has been converted to a percentage. x = 1 x = 2 x = 3 x = 4 x = 5 x = 6 Observed Frequency 3407 3631 3176 2916 3448 3422 Expected Frequency 3333.33 3333.33 3333.33 3333.33 3333.33 3333.33 Relative Error (%) 2.21% 8.93% –4.72% –12.52% 3.44% 2.66% The rather large relative errors call the fairness of the die into question. 6 2.2.3 Hypergeometric Distribution Let n, M , and N be integers with 0 < M < N and 0 < n < N . The random variable X is said to be a hypergeometric random variable, or to have a hypergeometric distribution, with parameters n, M , and N , when its PDF is as follows: p(x) = P (X = x) = ( M x ) × ( N−M n−x )( N n ) , when x is an integer between max(0, n+M −N) and min(n,M), and 0 otherwise. Note that hypergeometric distributions are used to model urn experiments, where N is the number of objects in the urn, M is the number of “special” objects, n is the size of the subset chosen from the urn and X is the number of special objects in the chosen subset. If the choice of each subset is equally likely, then X has a hypergeometric distribution. Example. A group of twenty-five first graders is to be randomly assigned to two classes: a class of 10 students to be taught by Mrs Smith and a class of 15 students to be taught by Mr Jones. Assume that each choice of rosters is equally likely. Five of the first graders are close friends. Let X be the number of close friends assigned to Mrs Smith’s class. Then X has a hypergeometric distribution, where n = , M = , and N = . The PDF and CDF of X are displayed below: • The probability that all 5 friends are assigned to the same class is • The probability that exactly 4 of the 5 friends are assigned to the same class is 7 2.2.6 Geometric and Negative Binomial Distributions Let X be the trial number of the rth success in a sequence of independent Bernoulli trials with success probability p. Then X is said to be a negative binomial random variable, or to have a negative binomial distribution, with parameters r and p. The PDF of X is as follows: p(x) = ( x− 1 r − 1 ) (1− p)x−rpr when x = r, r + 1, r + 2, . . ., and 0 otherwise. Note that, for each x, p(x) is the probability of the event “Exactly x− r failures and r successes in x trials, with the last trial a success.” The geometric distribution corresponds to the special case where r = 1. If X has a geometric distribution, then its PDF is as follows: p(x) = (1− p)x−1p, when x = 1, 2, 3, . . ., and 0 otherwise. Exercise. Let X be the trial number of the first success in a sequence of independent Bernoulli trials with success probability p. Find simplified formulas for (a) the cumulative probability P (X ≤ x), where x is a nonnegative integer, and (b) the upper tail probability P (X > x), where x is a nonnegative integer. 10 Exercise. A Bernoulli experiment consists of rolling a fair six-sided die and recording an S (for success) if 1 or 4 dots appear on the top face, and an F (for failure) otherwise. (a) Let X be the trial number of the first success in a sequence of independent trials of the experiment. Then X has a geometric distribution with parameter p = . The PDF and CDF of X are displayed below: • The probability of observing 5 or more failures before the first success is (b) Let X be the trial number of the third success in a sequence of independent trials of the experiment. Then X has a negative binomial distribution with parameters r = and p = . The PDF and CDF of X are displayed below: • The probability of observing 4 or fewer failures before the third success is 11 2.2.7 Poisson Limit Theorem, Poisson Distribution The following limit theorem, proven by S.Poisson in the 1830’s, can be used to estimate bino- mial probabilities when the number of trials is large and the probability of success is small. Theorem (Poisson Limit Theorem). Let λ (“lambda”) be a fixed positive real number, let n be an integer greater than λ, and let p = λn . Then lim n→∞ ( n x ) px(1− p)n−x = e−λλ x x! when x = 0, 1, 2, . . .. Note that the proof uses the fact that lim n→∞ ( 1 + r n )n = er for any real number r. For example, let X be the number of successes in 10000 independent trials of a Bernoulli experiment whose success probability is p = 12500 , and consider finding the probability that the binomial random variable X equals 3. • Using the binomial PDF, the probability is P (X = 3) = ( 10000 3 )( 1 2500 )3(2499 2500 )9997 ≈ 0.195386 • Using Poisson’s approximation, P (X = 3) = ( 10000 3 )( 1 2500 )3(2499 2500 )9997 ≈ e−4 4 3 3! ≈ 0.195367 since λ = np = 4. Poisson distribution. Let λ be a positive real number. The random variable X is said to be a Poisson random variable, or to have a Poisson distribution, with parameter λ if its PDF is as follows: p(x) = P (X = x) = e−λ λx x! when x = 0, 1, 2, . . ., and 0 otherwise. Exercise. Use the Maclaurin series for er to demonstrate that ∞∑ x=0 p(x) = 1. 12 Note, in particular, that if a ∈ R, then P (X = a) = 0 since the area under the density curve and over an interval of length zero is zero. Exercise. Let X be the continuous random variable with PDF as follows: f(x) = 3 8 (x+ 1)2 when −1 ≤ x ≤ 1 and 0 otherwise. The PDF and CDF of X are displayed below: (a) Completely specify the cumulative distribution function of X. (b) Find P (X ≥ 0). 15 Exercise. Let X be the continuous random variable with PDF as follows: f(x) = 200 (10 + x)3 when x ≥ 0 and 0 otherwise. The PDF and CDF of X are displayed below: (a) Completely specify the cumulative distribution function of X. (b) Find P (X ≥ 8). 16 (c) Suppose instead that the PDF of X is f(x) = c (10 + x)4 when x ≥ 0 and 0 otherwise, where c is some positive constant. Find the value of c. 2.3.2 Quantiles, Percentiles Let X be a continuous random variable, and p be a proportion satisfying 0 < p < 1. The pth quantile (or 100pth percentile) of the X distribution is the point, xp, satisfying the equation P (X ≤ xp) = p. To find xp, solve the equation F (x) = p for x. Median, quartiles, interquartile range. Important special cases are 1. Median: The median of X is the 50th percentile. 2. Quartiles: The quartiles of X are the 25th, 50th, and 75th percentiles. In addition, the interquartile range (IQR) is the following difference IQR = 75th Percentile − 25th Percentile. Note that the median is a measure of the center of a continuous distribution, and the in- terquartile range is a measure of the spread of the distribution. 17 Exercise. Recall that positive angles are measured counterclockwise from the positive x-axis, and negative angles are measured clockwise from the positive x-axis. Let Θ be an angle in the interval [0, π]. Given θ ∈ [0, π], construct a ray through the ori- gin at angle θ and let (x, y) be the point where the ray intersects the half-circle of radius 1, as illustrated in the plot to the right. Assume Θ is a uniform random variable on the interval [0, π]. Find the probability that (a) the x-coordinate of the point of intersection is less than 1/2. (b) the y-coordinate of the point of intersection is less than 1/2. 20 2.3.4 Exponential Distribution, Relationship to Poisson Process Let λ be a positive real number. The random variable X is said to be an exponential random variable, or to have an exponential distribution, with parameter λ when its PDF is as follows: f(x) = λe−λx when x ≥ 0 and 0 otherwise. General forms of the PDF and CDF of X are shown below: Note that I have marked the location of the median in each plot. Exercise. Let X be an exponential random variable with parameter λ. (a) Completely specify the cumulative distribution function of X. (b) Find a general formula for the pth quantile of the X distribution. 21 (c) Use your answer to part (b) to show that the median of the X distribution is ln(2)/λ and that the interquartile range is ln(3)/λ. Relationship to Poisson process. If events occurring over time follow an approximate Poisson process with rate λ, where λ is the average number of events per unit time, then the time between successive events has an exponential distribution with parameter λ. To see this, (i) If you observe the process for t units of time and let Y equal the number of observed events, then Y has a Poisson distribution with parameter λt. The PDF of Y is as follows: p(y) = e−λt (λt)y y! when y = 0, 1, 2, . . . and 0 otherwise. (ii) An event occurs, the clock is reset to time 0, and X is the time until the next event occurs. Then X is a continuous random variable whose range is x > 0. Further, P (X > t) = P (0 events in the interval [0, t]) = P (Y = 0) = e−λt and P (X ≤ t) = 1− e−λt. (iii) The PDF of X can be obtained from the CDF using derivatives. Since f(t) = d dt P (X ≤ t) = d dt ( 1− e−λt ) = λe−λt when t > 0 (and 0 otherwise) is the same as the PDF of an exponential random variable with parameter λ, X has an exponential distribution with parameter λ. Example (Hand et al, Chapman & Hall, 1994) Researchers in Great Britain studied the occurrences of major earthquakes worldwide over the period of 1900 to 1980. They determined that the average time between successive events was approximately 425 days and that events followed an approximate Poisson process with rate λ = 1 425 (or 1 event every 425 days, on average). 22 Example. Suppose that cars passing a certain intersection in the middle of a work day follow a Poisson process, with an average of 5 cars per hour. Let X be the time in hours until the third car passes. Then X has a gamma distribution with parameters α = r = and λ = . The PDF and CDF of X are displayed below: • The probability that three cars will pass the intersection in one hour or less is • The probability that it takes more than 45 minutes for three cars to pass the intersection is 25 2.3.6 Distributions Related to Poisson Processes In summary, there are three distributions related to Poisson processes over time: 1. Poisson distribution: If X is the number of events occurring in a fixed period of time, then X is a Poisson random variable with parameter λ, where λ equals the average number of events for that fixed period of time. The probability that exactly x events occur in that interval is P (X = x) = e−λ λx x! when x = 0, 1, 2, . . ., and 0 otherwise. 2. Exponential distribution: If X is the time between successive events, then X is an exponential random variable with parameter λ, where λ is the average number of events per unit time. The CDF of X is F (x) = 1− e−λx when x > 0 and 0 otherwise. 3. Gamma distribution: If X is the time to the rth event, then X is a gamma random variable with parameters α = r and λ, where λ is the average number of events per unit time. The CDF of X is F (x) = 1− r−1∑ y=0 e−λx (λx)y y! when x > 0 and 0 otherwise. 26 2.3.7 Cauchy Distribution Let a be a real number and b be a positive real number. The continuous random variable X is said to be a Cauchy random variable, or to have a Cauchy distribution, with center a and spread b when its PDF and CDF are as follows: 1. Cauchy PDF: f(x) = b π(b2 + (x− a)2) for all real numbers x. 2. Cauchy CDF: F (x) = 1 2 + 1 π tan−1 ( x− a b ) for all real numbers x. General forms of the PDF and CDF of X are shown below: Note that I have marked the locations of the quartiles of the distribution. Exercise. Let X be a Cauchy random variable with center a and spread b. (a) Find a general formula for the pth quantile of the X distribution. (b) Use your answer to part (a) to demonstrate that the median of the X distri- bution is a and the interquartile range is 2b. 27 When z is negative, we use the fact that the normal density curve is symmetric to find cumu- lative probabilities. Specifically, if z < 0, then Φ(z) = 1− Φ(−z), as illustrated on the right. Exercise. Use Table 1 (page 31) to find P (Z ≤ 0.53), P (Z ≤ −1.32), and P (−1 < Z < 1.5). Quantiles of the standard normal distribution. Let zp be the pth quantile of the stan- dard normal distribution. That is, zp is the point satisfying the equation Φ(zp) = p. Exercise. Use Table 1 (page 31) to find (approximately) the 20th, 40th, 60th and 80th per- centiles of the standard normal distribution. 30 Table 1: Standard normal cumulative probabilities, Φ(z) = P (Z ≤ z), when z ≥ 0. z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359 0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753 0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141 0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517 0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879 0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224 0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549 0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852 0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133 0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389 1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621 1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830 1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015 1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177 1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319 1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441 1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545 1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633 1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706 1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767 2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817 2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857 2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890 2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916 2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936 2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952 2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964 2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974 2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981 2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986 3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990 3.1 0.9990 0.9991 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.9993 3.2 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.9995 3.3 0.9995 0.9995 0.9995 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.9997 3.4 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9998 3.5 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 3.6 0.9998 0.9998 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 3.7 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 3.8 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 31 Quantiles of the normal distribution. Let xp be the pth quantile of the normal distribu- tion with mean µ and standard deviation σ. Then, xp = µ+ zpσ, where zp is the pth quantile of the standard normal distribution. Exercise. Let X be a normal random variable with mean µ and standard deviation σ. Use Ta- ble 1 (page 31) to demonstrate that the median of the X distribution is µ and the interquartile range is approximately 1.36σ. Exercise (Agresti & Franklin, 2007, page 306). Distributions of heights for adult men and for adult women are often well-approximated by normal distributions. In North America, for example, (1) Women’s Heights: the distribution of heights for adult women is well- approximated by a normal distribution with mean 65 inches and standard devi- ation 3.5 inches. (2) Men’s Heights: the distribution of heights for adult men is well- approximated by a normal distribution with mean 70 inches and standard deviation 4 inches. In each case, I drew a “z-axis,” where z = (x − µ)/σ, under the x-axis, and highlighted the area under the curve between 5 feet (60 inches) and 6 feet (72 inches). 32 Exercise. Consider a wire of length 10 inches, with a coordinate system as shown on the right. Given x ∈ (0, 10), imagine bending the wire at position x by 90o to form a right triangle and let y be the area of that triangle. Let X be an arbitrary bending point and Y be the area of the resulting triangle. Assume that X has a uniform distribution on (0, 10). Completely specify the PDF of Y . 35 Exercise. Let Z be the standard normal random variable, and let Y = Z2. Completely specify the PDF of Y . 36