Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

An introduction to population distributions, focusing on frequency distributions for discrete and continuous random variables. It covers the concepts of probability distributions, discrete random variables (bernoulli and poisson distributions), and continuous random variables. The document also discusses the mean and variance of these distributions and their relationship to the density curve.

Typology: Study notes

Pre 2010

1 / 22

Download Population Distributions: Probabilities for Discrete & Continuous Variables and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity! Topic (8) – POPULATION DISTRIBUTIONS 8-1 Topic (8) – POPULATION DISTRIBUTIONS So far: We’ve seen some ways to summarize a set of data, including numerical summaries. We’ve heard a little about how to sample a population effectively in order to get good estimates of the population quantities of interest (e.g. taking a good sample and calculating the sample mean as a way of estimating the true but unknown population mean value) We’ve talked about the ideas of probability and independence. Now we need to start putting all this together in order to do Statistical Inference, the methods of analyzing data and interpreting the results of those analyses with respect to the population(s) of interest. The Probability Distribution for a random variable can be a table or a graph or an equation. Topic (8) – POPULATION DISTRIBUTIONS 8-2 Let’s start by reviewing the ideas of frequency distributions for populations using categorical variables. QUALITATIVE (NON-NUMERIC) VARIABLES For a random variable that takes on values of categories, the Probability distribution is a table showing the likelihood of each value. EXAMPLE Tree species found in a boreal forest. For each possible species there would a probability associated with it. E.g. suppose there are 4 species and three are very rare and one is very common. A probability table might look like: Species Probability 1 0.01 2 0.03 3 0.08 4 0.88 All 1.00 We interpret these values as the probability that a random selection would result in observing that species. We could also draw a bar chart but it would be fairly non- informative in this instance since one value is so much larger than the others! An equation cannot be developed since the values that the variable takes on are non-numeric. Topic (8) – POPULATION DISTRIBUTIONS 8-5 Since we have sampled the entire population (the set of counts for every quadrat in the region), this histogram represents the probability distribution of the random variable X = ”number of trees/quadrat”. In general, the Poisson distribution is a common probability distribution for counts per unit time or unit area or unit volume. The graph can also be described using an equation known as the Poisson Distribution Probability Mass Function. It gives the probability of observing a specific count (x) in any randomly selected quadrat as ! )Pr( x exX xµµ− == where )1)(2)(3)...(2)(1(! −−= xxxx and ,...2,1,0=x . In order for this distribution to be a valid probability distribution, we require that the total probability for all possible values equal 1 and that every possible value have a probability associated with it. ∑∑ = − = === ,...2,1,0,...2,1,0 1 ! )Pr( X x X x exX µ µ Topic (8) – POPULATION DISTRIBUTIONS 8-6 and 0 ! )Pr( ≥== − x exX xµµ The mean of the Poisson distribution is µ and the variance is µ as well. DISCRETE UNIFORM DISTRIBUTION: every discrete value that the random variable can take on has the same probability of occurring. For example, suppose a researcher is interested in whether the number of setae on the first antennae of an insect is random or not. Further, the researcher believes that there must be at least 1 seta and at most 8. Then s/he is postulating that every value between 1 and 8 are equally likely to be observed in a random draw of an insect from the population (or equivalently, that there are equal numbers of insects with 1, 2, …, or 8 setae in the population). Such a distribution is known as the Discrete Uniform Distribution. Let K be the total number of distinct values that the random variable can take on (e.g. the set {1, 2, …, 8} contains K = 8 distinct values). Then, K xX 1)Pr( == for x = 1, 2, …, 8 Topic (8) – POPULATION DISTRIBUTIONS 8-7 In addition, the mean for this particular discrete uniform is 5.4 8 36 === ∑ K xµ and the variance is 25.5)5.4( 2 2 = − = ∑ K xσ . Also, it is easy to see that the probabilities sum to 1 as required. Finally, the graph of the distribution looks like a rectangle: 0 2 4 6 8 Topic (8) – POPULATION DISTRIBUTIONS 8-10 Fact 3: When the curve is describing frequency distribution of the population, every observation must fall within the limits of the distribution. Hence, 100% of the observations are listed. Topic (8) – POPULATION DISTRIBUTIONS 8-11 When we combine these three facts, we get that the density curve describing the frequency distribution of values of a quantitative variable 1) has a total area under the curve of 1 (analogous to 100%) and 2) the area over a range of values equals the relative frequency of that range in the population, i.e. the area equals the probability of observing a value within that range Area in between these two lines is the probability that X falls between the values of 5 and 8. 5 8 There are many standard (common) density curves: Topic (8) – POPULATION DISTRIBUTIONS 8-12 UNIFORM DISTRIBUTION – every subset interval of the same length is equal likely. For example, suppose we randomly selected a number from the number line [0, 10]. Then the Probability distribution is given by LU abbXa − − =<< )Pr( for ],[ ULX ∈ and . 0, >UL Uniform 0 1 2 3 4 5 6 7 8 9 10 e.g. Pr(3<X<4) = The mean of a Uniform distribution is 2 LU − =µ and the variance is Topic (8) – POPULATION DISTRIBUTIONS 8-15 Question: What do we do when the value of interest in the probability phrase does NOT fall exactly at the standard deviation cutoffs? E.g. find Pr(IQ<110)? Answer: Convert the value to a Z-score and use it and a look up table (or a computer program) to calculate the probability. Recall the Z-SCORE for a value is the number of standard deviations that value is from the mean: Z score z x− = = −* µ σ e.g. IQ of 110 ≡ z* .= − = − =110 110 100 15 0 667µ σ Topic (8) – POPULATION DISTRIBUTIONS 8-16 Defn: When X is normally distributed, the Z-score has a STANDARD NORMAL DISTRIBUTION. The Standard normal distribution is a normal distribution with a mean of µ=0 and a standard deviation of σ=1. µ−3σ µ−1σ µ+1σ µ+3σ µ−2σ µ µ+2σ Original IQ score 55 70 85 100 115 130 145 Equivalent Z-score -3 -2 -1 0 +1 +2 +3 Topic (8) – POPULATION DISTRIBUTIONS 8-17 So, the important point here is that we need to do the conversion )Pr(Pr)Pr( zZaXaX <=⎟ ⎠ ⎞ ⎜ ⎝ ⎛ −< − =< σ µ σ µ in order to find probabilities of events under a normal distribution e.g. )667.0Pr( 15 100110 15 100Pr 110Pr)110Pr( <=⎟ ⎠ ⎞ ⎜ ⎝ ⎛ −< − = ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ −< − =< ZIQ IQIQ σ µ σ µ Next, look up the area (i.e. Probability) on a table: 7486.0)667.0Pr( =<Z , so approximately 75% of the population has an IQ less than 110. Topic (8) – POPULATION DISTRIBUTIONS 8-20 Some practice which also uses the rules for Probability that we learned earlier: 1. Find Pr(IQ>92) 2. Find Pr(70<IQ<120). Topic (8) – POPULATION DISTRIBUTIONS 8-21 Finding Quantiles for the Normal Distribution Most often used to find extreme values in the very highest (or lowest) percentages EXAMPLE Suppose adult male heights are normally distributed with a mean of 69” and a standard deviation of 3.5”. We have learned how to answer questions like: What proportion of the population are taller than 6’ (72”)? How do we answer a question like: Find the range of likely heights for the shortest 5% of the male population, i.e. what height is the 5th percentile of the population? Here we are being asked to find the value of a that makes the following probability statement true: Pr(Height < a) = 0.05 We know that Pr(Height < a) = Pr(Z < z*) So we’ll start by solving Pr(Z < z*)=0.05 Topic (8) – POPULATION DISTRIBUTIONS 8-22 for z*. Now, we’ll use the fact that z a* = − µ σ and our knowledge of the values of µ and σ to solve for a.