Measuring Variability & Inferring Population Parameters: Sampling Distributions - Prof. Da | Study notes Statistics

Sampling Distributions

Introduction

Sampling distributions represent a troublesome topic for many students. However, they

are important because they are the basis for making statistical inferences about a

population from a sample. One problem sampling distributions solve is to provide a

logical basis for using samples to make inferences about populations. Sampling

distributions also provide a measure of variability among a set of sample means. This

measure of variability will, in turn, allow one to estimate the likelihood of observing a

particular sample mean collected in an experiment.

At the simplest level, when testing a hypothesis one is testing whether an obtained

sample comes from a known population, usually the general population. If the sample

value is likely for the known population then it is likely that the value must come from

the known population. If the sample value is unlikely for the known population then it

likely does not come from the known population, and it can then inferred that it instead

that it comes from a different unknown population. If some treatment is performed, like

giving a drug to improve patient recovery rates, then a sample value from the treated

group will allow a test of the idea that treatment had some effect (here on recovery rates).

Does giving patients this new drug in effect create a new and different population; a

population using the drug? If the average recovery rate of the treated group is very

similar to or likely for the known population of patients that do not take the drug (the

general population), then the treatment likely had no effect. If the average recovery rate

is very different from or very unlikely for the known population of patients not taking the

drug, then the treatment must have had an effect and created a new population of patients

with different outcomes. Thus, some way to judge how likely a value is for the known

population is needed.

The common formula used to find the probability or the likelihood of a value for a known

population (solving z-score problems) is:

In the above formula the standard deviation, sigma (σ), gives information about how ), gives information about how

much variability exists in the population. Knowing how much variability exists in the

population (the width of the distribution of scores) allows one to know how likely a

single x-value is for that population. Since most values in a distribution will lie close to

the mean, less likely values will fall farther from the mean. The wider the distribution of

scores the less pronounced any specific difference between a value and the mean will be.

For example, if the difference between an x-value and the population mean remains

constant (in the numerator), then that difference will be much more likely if the

population has a very wide distribution (large denominator) compared to its likelihood in

a very narrow distribution (small denominator). So, any factor, like a decreased spread in

the distribution of scores, that increases the relative difference between a value and the

mean will lower our estimate of how likely the value is for the distribution.

Partial preview of the text

Download Measuring Variability & Inferring Population Parameters: Sampling Distributions - Prof. Da and more Study notes Statistics in PDF only on Docsity!

Sampling Distributions Introduction Sampling distributions represent a troublesome topic for many students. However, they are important because they are the basis for making statistical inferences about a population from a sample. One problem sampling distributions solve is to provide a logical basis for using samples to make inferences about populations. Sampling distributions also provide a measure of variability among a set of sample means. This measure of variability will, in turn, allow one to estimate the likelihood of observing a particular sample mean collected in an experiment. At the simplest level, when testing a hypothesis one is testing whether an obtained sample comes from a known population, usually the general population. If the sample value is likely for the known population then it is likely that the value must come from the known population. If the sample value is unlikely for the known population then it likely does not come from the known population, and it can then inferred that it instead that it comes from a different unknown population. If some treatment is performed, like giving a drug to improve patient recovery rates, then a sample value from the treated group will allow a test of the idea that treatment had some effect (here on recovery rates). Does giving patients this new drug in effect create a new and different population; a population using the drug? If the average recovery rate of the treated group is very similar to or likely for the known population of patients that do not take the drug (the general population), then the treatment likely had no effect. If the average recovery rate is very different from or very unlikely for the known population of patients not taking the drug, then the treatment must have had an effect and created a new population of patients with different outcomes. Thus, some way to judge how likely a value is for the known population is needed. The common formula used to find the probability or the likelihood of a value for a known population (solving z-score problems) is: In the above formula the standard deviation, sigma (σ), gives information about how ), gives information about how much variability exists in the population. Knowing how much variability exists in the population (the width of the distribution of scores) allows one to know how likely a single x-value is for that population. Since most values in a distribution will lie close to the mean, less likely values will fall farther from the mean. The wider the distribution of scores the less pronounced any specific difference between a value and the mean will be. For example, if the difference between an x-value and the population mean remains constant (in the numerator), then that difference will be much more likely if the population has a very wide distribution (large denominator) compared to its likelihood in a very narrow distribution (small denominator). So, any factor, like a decreased spread in the distribution of scores, that increases the relative difference between a value and the mean will lower our estimate of how likely the value is for the distribution.

However, when testing a hypothesis it is never based on a single x-value. Instead, a sample of values is used from which the average is computed. If the average or mean value tested is very different from the known population, then it can assumed the population the sample represents is not the same as the known population mean (μ). The ). The problem in using the above formula is that sigma gives information about how much individual values vary within a population, but nothing about how much sample means vary. Sampling distributions provide an explanation of how to measure variability in samples, and thus the probability of observing a particular sample mean. Sampling Distribution of the Mean Sampling distributions are theoretical, and not actually computed. However, examining the process of computing one is necessary. There are many types of sampling distributions, and a sampling distribution for any statistic can be formed. For the current discussion, the sampling distribution of the mean is most relevant. To form a sampling distribution:

Sample repeatedly and exhaustively from the population.
Calculate the statistic of interest (the mean) for each sample.
Form a distribution of the set of means obtained from the samples. The sample of values taken from the population to form a sample can be any specific size, but every possible sample of that size from the population must be taken. Then, an average of each sample is computed in order to examine this new set of scores. The set of means obtained from each sample will form a new distribution, a sampling distribution. In this case, where the mean is computed as the statistic, it will be the sampling distribution of the mean. Every possible combination of values from the population is sampled to form a true sampling distribution. Since most populations are very large it is impractical to actually go through the process which is why they remain theoretical. The first important fact learned from the sampling distribution of the mean is that the mean of the population and the mean of the sampling distribution of means will have exactly the same value. That is, the average of the entire population of single x-values is exactly the same as the average value of the set of sample means from the sampling distribution. This fact is important to hypothesis testing because when testing a hypothesis based on a sample, even though a single sample will not likely be exactly like the population, it will be on average. Thus, it is certain that repeated experiments will yield samples that will on average be the same as the population mean. Using a sample to make an inference about a population is therefore a logical and reasonable proposition. The next important piece of information obtained from the sampling distribution of the mean is a measure of variability among sample means. Recall that some way to measure how much variability that exists in a set of sample means is needed so that there will be some way to gauge how likely it is to obtain a particular sample mean collected in an experiment. If the value obtained in a sample is unlikely for the known general population, then the population it comes from is probably different from the known

Notice that the denominator is an estimate of the standard error, and it is the same whether computing a z-test or a t-test. The distance a sample mean falls from the mean of the population is mediated by how much variability there is from sample to sample. If it is relatively unlikely to observe a certain sample (p<.05 for alpha=.05), then we can conclude that the sample did not come from the known population. Finally, sampling distributions also yield information about how large a sample needs to be in order to test a hypothesis. The shape of the sampling distribution of the mean will always be normal regardless of the shape of the population distribution. Whether the population distribution has a normal, positively or negatively skewed, unimodal or bimodal shape, the sampling distribution of the mean will always have a “normal” (unimodal and symmetric) shape. That’s because when a distribution of sample means is formed, each value in the distribution is derived from a sample that contains a variety of scores from the population. Because each value in the sampling distribution is an average of these values from the population, most of the scores will lay close the mean of the population and create unimodal and symmetric distribution, even if the values in the population of single x-values do not. Recall that when values are used to form a sampling distribution, samples of any size can be used. However, the larger the number of values in a sample taken from the population to form the sampling distribution, the more “normal” the sampling distribution will be. That’s because there will be a larger variety of values from the population in any individual sample, and the more likely the average from each sample will approximate the average value of the entire population. As it turns out, at around 30 values in a sample is when there is enough variety contained in the sample for those values to average out very close to the average of the population. However, the larger the number of values we take in a sample the closer we get to the average of the population. Since an estimate the standard error is usually made from a sample, the sample size needs to be around 30 in order to approximate the value that would be obtained if the standard error was computed from the population. Thus, the minimum number of values needed to approximate the population with a sample is usually close to 30 and it is best to have this minimum number in any sample used for hypothesis testing. David S. Wallace Cross-references See also: Central Limit Theorem, Hypothesis Testing, Normal Distribution, Standard Error of the Mean, t Test-One Sample, Variance, Further Readings Gravetter, F. J., & Wallnau (2002). Essentials of statistics for the behavioral sciences (4th^ ed). Pacific Grove, CA: Wadsworth. Hays, W. (1994 ). Statistics (5th^ ed.). Orlando, FL: Harcourt Brace.

Howell, D. C. (1999). Fundamental statistics for the behavioral sciences (4th^ ed). Pacific Grove, CA: Duxbury Press.

Measuring Variability & Inferring Population Parameters: Sampling Distributions - Prof. Da, Study notes of Statistics

Related documents

Partial preview of the text

Download Measuring Variability & Inferring Population Parameters: Sampling Distributions - Prof. Da and more Study notes Statistics in PDF only on Docsity!