Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Probability TheoryMathematical StatisticsApplied Statistics

The concepts of discrete and continuous random variables, with a focus on continuous distributions. It covers topics such as histograms, relative frequency histograms, quartiles, interquartile range, and exploratory data analysis. The text also introduces the uniform and exponential distributions, and explains how to calculate their means and variances.

What you will learn

- What is the role of class boundaries or cutpoints in analyzing continuous data?
- What is the difference between discrete and continuous random variables?
- What are the three quartiles and how are they used to describe a sample of data?

- How do you construct a histogram for continuous data?
- What is the interquartile range and how is it calculated?

Typology: Exercises

2021/2022

1 / 12

Download Continuous Distributions: Discrete vs. Continuous Variables and Data Analysis and more Exercises Statics in PDF only on Docsity! Chapter 3 Continuous Distributions 3.1 Continuous-Type Data In Chapter 2, we discuss random variables whose space S contains a count- able number of outcomes (i.e. of discrete type). In Chapter 3, we study random variables whose space S contains an interval of numbers (i.e. of continuous type). For a set of continuous-type data, we can group the data into classes of equal width and then construct a histogram of the grouped data. 1. Determine the minimum observation m and the maximum observation M . The range R = maximum−minimum. 2. Divide [m,M ] into k = 5 to k = 20 class intervals of equal width, say (c0, c1), (c1, c2), · · · , (ck−1, ck), where c0 = m and ck = M . 3. The boundaries c0, c1, · · · , ck are called class boundaries or cut- points. The class mark is the midpoint of a class. The class limits are the smallest and the largest possible observed values in a class. The frequency table, frequency histogram, and relative frequency his- togram may be used to analyze the grouped data. A relative frequency histogram or density histogram consists of rectangles, each with base 29 30 CHAPTER 3. CONTINUOUS DISTRIBUTIONS the class interval and area the relative frequency. The height of rectangle with base (ci−1, ci) is h(x) = fi n(ci − ci−1) , for ci−1 < x ≤ ci, i = 1, 2, · · · , k, where fi is the number of observed data in the class (ci−1, ci) and n is the total number of observations. Ex 1, p.112. (scanned file) If the data x1, x2, · · · , xn have a sample mean x̄ and sample standard deviation s, and the histogram of these data is “bell shape”, then approxi- mately • 68% of the data are in the interval (x̄− s, x̄+ s); • 95% of the data are in the interval (x̄− 2s, x̄+ 2s); • 99.7% of the data are in the interval (x̄− 3s, x̄+ 3s); The relative frequency polygon may be used to study the data. Ex 2, p.114. (scanned file) Sometimes class intervals are not required to have equal width. Ex 3, p.116. (scanned file) Homework §3.1 3, 7, 9 Attachment: Scanned textbook pages of Section 3-1 3.3. RANDOM VARIABLES OF THE CONTINUOUS TYPE 33 3.3 Random Variables of the Continuous Type The relative frequency histogram h(x) of a continuous r.v. X is defined so that the area below the graph and between [a, b] is approximately P (a < X < b). For many continuous r.v. X , there exists a probability density function (p.d.f.) f(x) of X such that 1. f(x) > 0, x ∈ S. 2. ∫ S f(x)dx = 1. 3. If (a, b) ⊆ S, then P (a < X < b) = ∫ b a f(x)dx. For convenience, we often define f(x) on (−∞,∞) by letting f(x) = 0 for x /∈ S. Then f(x) ≥ 0 for all x, and ∫∞ −∞ f(x)dx = 1. Ex 1, p.132 (scanned file) The (cumulative) distribution function (c.d.f.) of a random vari- able X of the continuous type, defined in terms of the p.d.f. of X, is F (x) = P (X ≤ x) = ∫ x −∞ f(t)dt, −∞ < x <∞. The c.d.f. F (x) is increasing function with 1. limx→−∞ F (x) = 0, 2. limx→∞ F (x) = 1, and 3. F ′(x) = f(x). Ex 2, p.133 (scanned file) Ex 3, p.134 (scanned file) Let X be a continuous random variable with p.d.f. f(x). • The expected value/mean of X is µ = E(X) = ∫ ∞ −∞ xf(x)dx. 34 CHAPTER 3. CONTINUOUS DISTRIBUTIONS • The variance of X is σ2 = Var(X) = E[(X − µ)2] = ∫ ∞ −∞ (x− µ)2f(x)dx = E[X2]− E[X]2 = ∫ ∞ −∞ x2f(x)dx− µ2. • The standard deviation of X is σ = √ Var(X). • The moment-generating function (m.g.f.) of X (if exists) is M(t) = E[etX ] = ∫ ∞ −∞ etxf(x)dx, −h < t < h. Ex 5, 6, p.136 (scanned file) The (100p)th percentile is a number πp such that the area under f(x) to the left of πp is p, that is, p = ∫ πp −∞ f(x)dx = F (πp). Recall that the 50th percentile is called the median, and the 25th and 75th percentiles are called the first and the third quartiles, respectively. For empirical data, if y1 ≤ y2 ≤ · · · ≤ yn are the order statistics associate with the sample x1, x2, · · · , xn, then yr is called the quantile of order r/(n + 1) as well as the 100r/(n + 1)th percentile. The plot of (yr, πp) for several values of r is called the quantile-quantile plot or the q-q plot (See Fig 3.6-3 on p.165, scanned file.) Ex 7, 8, p.137-139 (scanned file) Homework §3.3 1, 7, 23, 25 Attachment: Scanned textbook pages of Section 3-3 3.4. THE UNIFORM AND EXPONENTIAL DISTRIBUTIONS 35 3.4 The Uniform and Exponential Distributions A r.v. X has a uniform distribution if its p.d.f. is equal to a constant on its support. In particular, if the support is [a, b], then we say that X ∼ U(a, b), and p.d.f. : f(x) = 1 b− a , a ≤ x ≤ b. c.d.f. : F (x) = ∫ x −∞ f(t)dt = 0, x < a, x−a b−a , a ≤ x < b, 1, b ≤ x. mean: µ = E(X) = a+ b 2 . variance: σ2 = E(X2)− E(X)2 = (b− a)2 12 . m.g.f. : M(t) = { etb−eta t(b−a) , t 6= 0, 1, t = 0 The curves of p.d.f. and c.d.f. for U(a, b) are given in Fig 3.4-1, p.141 (scanned file). An important uniform distribution is U(0, 1). Ex 1, p.142 (scanned file) Now we explore the exponential distribution. Suppose in a process the number of changes occurring in a unit time interval has a Poisson distribu- tion with mean λ. This is a discrete random variable. However, the waiting times W between successive changes is a continuous random variable. The space of W is [0,∞). The distribution function of W is F (w) = P (W ≤ w) = 1− P (W > w) = 1− P (no changes in [0,w]) = 1− e−λw. The p.d.f. of W is F ′(w) = λe−λw. We often let λ = 1/θ (so θ is the average waiting time of successive changes), and say that the r.v. X has an exponential distribution if its p.d.f. is p.d.f. : f(x) = 1 θ ex/θ, 0 ≤ x <∞. 38 CHAPTER 3. CONTINUOUS DISTRIBUTIONS Let r be a positive integer. The gamma distribution X with θ = 2 and α = r/2 has a chi-square distribution with r degrees of freedom, denoted by X ∼ χ2(r). p.d.f.: f(x) = 1 Γ(r/2)2r/2 xr/2−1e−x/2, 0 ≤ x <∞, m.g.f.: M(t) = (1− 2t)−r/2, t < 1 2 , mean: µ = αθ = r, variance: σ2 = αθ2 = 2r. Fig 3.5-2 (p152) shows the p.d.f.’s for some chi-square’s. Table IV in Appendix can be used to find the c.d.f.’s of chi-square dis- tributions (scanned file). Ex 3-4, p.152-153 (scanned file) For a chi-square distribution X ∼ χ2(r), the number χ2 α(r) is defined such that P [X ≥ χ2 α(r)] = α. So χ2 α(r) is the 100(1−α)th percentile of the chi-square distribution with r degrees of freedom. Ex 5, p.153 (scanned file) Ex 6, p.154 (scanned file) Homework §3.5 1, 7, 9, 15, 17 Attachment: Scanned textbook pages of Section 3-5 3.6. THE NORMAL DISTRIBUTION 39 3.6 The Normal Distribution A random variable X has a normal distribution if its p.d.f. is defined by p.d.f. : f(x) = 1 σ √ 2π exp [ −(x− µ)2 2σ2 ] , −∞ < x <∞, where the parameters µ ∈ (−∞,∞) and σ > 0. We say that X ∼ N(µ, σ2). The r.v. X has characteristics m.g.f. : M(t) = exp ( µt+ σ2t2 2 ) , −∞ < t <∞, mean: E(X) = M ′(0) = µ, variance: Var(X) = M ′′(0)− [M ′(0)]2 = σ2. Ex 1-2, p.158-159 (scanned file) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . If Z ∼ N(0, 1), we say that Z has a standard normal distribution. Clearly, Z has mean 0, variance 1, m.g.f. M(t) = et 2/2, and c.d.f. : Φ(z) = P (Z ≤ z) = ∫ z −∞ 1√ 2π e−w 2/2 dw. Table Va gives the c.d.f. P (Z ≤ z) for standard normal distribution N(0, 1) and positive z. Table Vb provides P (Z > z). In fact, the p.d.f. curve of N(0, 1) is symmetric about the y-axis. So if z > 0, then Φ(−z) = 1− Φ(z). Thus Table Va is enough. Ex 3-4, p.159-160 (scanned file) In statistical application, we often want to find zα for Z ∼ N(0, 1) such that P (Z ≥ zα) = α. That is, zα is the 100(1− α)th percentile (or called the upper 100α percent point) of the standard normal distribution (See Fig 3.6-2 in scanned file). By the symmetry of the p.d.f. of N(0, 1), P (Z ≥ z1−α) = 1− α = 1− P (Z ≥ zα) = P (Z ≥ −zα). Thus z1−α = −zα. 40 CHAPTER 3. CONTINUOUS DISTRIBUTIONS Ex 5, p.161 (scanned file) Thm 3.1. If X is N(µ, σ2), then Z := X−µ σ is N(0, 1). The proof is skipped. It is easy to verify that E(X) = 0 and Var(Z) = 1. By this theorem, any normal distribution can be changed to a standard normal distribution. Ex 6-7, p.162-163 (scanned file) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (optional) A normal distribution is also related to chi-square. Thm 3.2. If X ∼ N(µ, σ2), then the r.v. V := (X−µ)2 σ2 = Z2 ∼ χ2(1). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . q-q plot can be used to examine the coherence between empirical data and conjectured model. Ex 9, p.164-165 (scanned file) Homework §3.6 1, 3, 5, 7, 13 Attachment: Scanned textbook pages of Section 3-6