



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
How sampling distributions help provide a logical basis for making inferences about populations by measuring variability among sample means. It covers the concept of testing hypotheses, the role of standard deviation in determining the likelihood of observing a particular sample mean, and the importance of the central limit theorem. The document also discusses the relationship between population and sample means, the concept of standard error, and the minimum sample size required for hypothesis testing.
Typology: Study notes
1 / 5
This page cannot be seen from the preview
Don't miss anything!




Sampling Distributions Introduction Sampling distributions represent a troublesome topic for many students. However, they are important because they are the basis for making statistical inferences about a population from a sample. One problem sampling distributions solve is to provide a logical basis for using samples to make inferences about populations. Sampling distributions also provide a measure of variability among a set of sample means. This measure of variability will, in turn, allow one to estimate the likelihood of observing a particular sample mean collected in an experiment. At the simplest level, when testing a hypothesis one is testing whether an obtained sample comes from a known population, usually the general population. If the sample value is likely for the known population then it is likely that the value must come from the known population. If the sample value is unlikely for the known population then it likely does not come from the known population, and it can then inferred that it instead that it comes from a different unknown population. If some treatment is performed, like giving a drug to improve patient recovery rates, then a sample value from the treated group will allow a test of the idea that treatment had some effect (here on recovery rates). Does giving patients this new drug in effect create a new and different population; a population using the drug? If the average recovery rate of the treated group is very similar to or likely for the known population of patients that do not take the drug (the general population), then the treatment likely had no effect. If the average recovery rate is very different from or very unlikely for the known population of patients not taking the drug, then the treatment must have had an effect and created a new population of patients with different outcomes. Thus, some way to judge how likely a value is for the known population is needed. The common formula used to find the probability or the likelihood of a value for a known population (solving z-score problems) is: In the above formula the standard deviation, sigma (σ), gives information about how ), gives information about how much variability exists in the population. Knowing how much variability exists in the population (the width of the distribution of scores) allows one to know how likely a single x-value is for that population. Since most values in a distribution will lie close to the mean, less likely values will fall farther from the mean. The wider the distribution of scores the less pronounced any specific difference between a value and the mean will be. For example, if the difference between an x-value and the population mean remains constant (in the numerator), then that difference will be much more likely if the population has a very wide distribution (large denominator) compared to its likelihood in a very narrow distribution (small denominator). So, any factor, like a decreased spread in the distribution of scores, that increases the relative difference between a value and the mean will lower our estimate of how likely the value is for the distribution.
However, when testing a hypothesis it is never based on a single x-value. Instead, a sample of values is used from which the average is computed. If the average or mean value tested is very different from the known population, then it can assumed the population the sample represents is not the same as the known population mean (μ). The ). The problem in using the above formula is that sigma gives information about how much individual values vary within a population, but nothing about how much sample means vary. Sampling distributions provide an explanation of how to measure variability in samples, and thus the probability of observing a particular sample mean. Sampling Distribution of the Mean Sampling distributions are theoretical, and not actually computed. However, examining the process of computing one is necessary. There are many types of sampling distributions, and a sampling distribution for any statistic can be formed. For the current discussion, the sampling distribution of the mean is most relevant. To form a sampling distribution:
Notice that the denominator is an estimate of the standard error, and it is the same whether computing a z-test or a t-test. The distance a sample mean falls from the mean of the population is mediated by how much variability there is from sample to sample. If it is relatively unlikely to observe a certain sample (p<.05 for alpha=.05), then we can conclude that the sample did not come from the known population. Finally, sampling distributions also yield information about how large a sample needs to be in order to test a hypothesis. The shape of the sampling distribution of the mean will always be normal regardless of the shape of the population distribution. Whether the population distribution has a normal, positively or negatively skewed, unimodal or bimodal shape, the sampling distribution of the mean will always have a “normal” (unimodal and symmetric) shape. That’s because when a distribution of sample means is formed, each value in the distribution is derived from a sample that contains a variety of scores from the population. Because each value in the sampling distribution is an average of these values from the population, most of the scores will lay close the mean of the population and create unimodal and symmetric distribution, even if the values in the population of single x-values do not. Recall that when values are used to form a sampling distribution, samples of any size can be used. However, the larger the number of values in a sample taken from the population to form the sampling distribution, the more “normal” the sampling distribution will be. That’s because there will be a larger variety of values from the population in any individual sample, and the more likely the average from each sample will approximate the average value of the entire population. As it turns out, at around 30 values in a sample is when there is enough variety contained in the sample for those values to average out very close to the average of the population. However, the larger the number of values we take in a sample the closer we get to the average of the population. Since an estimate the standard error is usually made from a sample, the sample size needs to be around 30 in order to approximate the value that would be obtained if the standard error was computed from the population. Thus, the minimum number of values needed to approximate the population with a sample is usually close to 30 and it is best to have this minimum number in any sample used for hypothesis testing. David S. Wallace Cross-references See also: Central Limit Theorem, Hypothesis Testing, Normal Distribution, Standard Error of the Mean, t Test-One Sample, Variance, Further Readings Gravetter, F. J., & Wallnau (2002). Essentials of statistics for the behavioral sciences (4th^ ed). Pacific Grove, CA: Wadsworth. Hays, W. (1994 ). Statistics (5th^ ed.). Orlando, FL: Harcourt Brace.
Howell, D. C. (1999). Fundamental statistics for the behavioral sciences (4th^ ed). Pacific Grove, CA: Duxbury Press.