




























































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An overview of inferential statistics, which is a body of methods used to draw conclusions or inferences about characteristics of populations based on sample data. It covers topics such as statistical inference, stratified random sampling, assigning probability to events, sampling distributions of the mean and difference between two means, and techniques for estimating and testing population parameters like the mean and proportion. The document also discusses anova and multiple comparison methods like tukey's test. Overall, this document covers fundamental concepts in inferential statistics and sampling theory, which are crucial for understanding how to make inferences about populations from sample data.
Typology: Exams
1 / 299
This page cannot be seen from the preview
Don't miss anything!





























































































Review Questions
o Self-selected samples
o the alternative hypothesis
o If the p-value < 0.01, there is overwhelming evidence to infer that thealternative hypothesis is false.
innocent, a Type I error is made when blank. o an innocent person is found guilty
of 91-day Treasury Bills was collected. The sample mean is 4.76% and the standard deviation is 171.21. What is the unbiased estimate for the mean of the population? o 4.76%
75.38 to 86.52. If the confidence level is reduced to 90%, the confidenceinterval for population mean blank. o becomes narrower
to blank. o The number of successes in the sample
between what the profit for an act is and the potential profit given an optimal decision. o an opportunity loss
o Non-response error
o A confidence level expresses the degree of certainty that an INTERVAL will include the actual value of the SAMPLE STATISTIC
What Is Statistics?
"Statistics is a way to get information from data."
The first is the "typical" grade. We call this a measure of central location. The mean (or average) is one such measure; it is the sum of all the data values divided by the number of values. Suppose the student was told that the average grade last year was 67. Is this enough information to reduce
his anxiety? The student would likely respond "no" and he would like to know whether most of the grades were close to 67 or if the grades were scattered far below and above the average. He needs a measure of variability. The simplest such measure is the range , which is calculated by subtracting the smallest number from the largest. Suppose the highest grade is 96 and the lowest grade is 24. The range of grades is 72. Unfortunately, this range calculation provides little additional information. The student also wants to know how the grades are distributed between 24 and 96. Next Page
Descriptive Statistics
The median is the midpoint of the distribution where 50% of the data values are higher and 50% are lower. (Note that the mean and median will not necessarily be an observed test score.) Finally, the mode is the most frequently occurring data value. The student might find it useful to know that the median score was 78 and the modal score was 80. He now knows that half the students scored 78 or higher and that 80 was the most frequently occurring test score. Apparently some very low test scores dragged the average down to 67. (See Figure 1.) There are two more measures of variability which are used in statistics. The variance is the average squared deviation from the mean. To compute the variance, the difference between each data value and the mean is calculated and squared. The mean of the resulting squared differences is the variance. Note that if the differences are not squared, their sum will always be 0. If the data values are, for example, heights in inches, the resulting variance will be measured in square inches. As we move further into our study of statistics, we will often use standard deviation as the measure of variability. Standard deviation is simply the square root of the variance and gets the variability measure back to the same units as the data. Standard deviation has many useful properties when the data is normally distributed. Next Page
Figure 1: Summary Statistics
1, 3, 3, 6 , 7, 8, 9 Median = 6 1, 3, 3, 4 , 5 6, 8, 9 Median = 4.
exclusivity agreement that would give Pepsi exclusive rights to sell its products at all university facilities for the next year with an option for future years. In return, the university would receive
35% of the on-campus revenues and an additional lump sum of $200,000 per year. Pepsi has been given 2 weeks to respond. The market for soft drinks is measured in terms of 12-ounce cans. Pepsi currently sells an average of 22,000 cans per week (over the 40 weeks of the year that the university operates). The cans sell for an average of 75 cents each. The costs including labor amount to 20 cents per can. Pepsi is unsure of its market share but suspects it is considerably less than 50%. Next Page
Descriptive Statistics
A quick analysis reveals that if its current market share were 25%, then, with an exclusivity agreement, Pepsi would sell 88,000 (22,000 is 25% of 88,000) cans per week or 3,520,000 cans per year (over the 40 weeks of university operation). The profit or loss can be calculated. The only problem is that we do not know how many soft drinks are sold weekly at the university. Pepsi assigned a recent university graduate to survey the university's students to supply the missing information. Accordingly, she organizes a survey that asks 500 students to keep track of the number of soft drinks they purchase over the next 7 days. The information we would like to acquire is an estimate of annual profits from the exclusivity agreement. The data are the numbers of cans of soft drinks consumed in 7 days by the 500 students in the sample. We can use descriptive techniques to learn more about the data. In this case, however, we are not so much interested in what the 500 students are reporting as we are in knowing the mean number of soft drinks consumed by all 50,000 students on campus. To accomplish this goal, we need the second branch of statistics called inferential statistics. Next Page
Inferential Statistics
Inferential statistics is a body of methods used to draw conclusions or inferences about characteristics of populations based on sample data. The population in question in this case is the soft drink consumption of the university's 50,000 students. The cost of interviewing each student would be prohibitive and extremely time consuming. Statistical techniques make such endeavors unnecessary. Instead, we can sample a much smaller number of students (the sample size is 500) and infer from the data the number of soft drinks consumed by all 50,000 students. We can then estimate annual profits for Pepsi. When an election for political office takes place, the television networks cancel regular programming and instead provide election coverage. When the ballots are counted, the results are reported. However, for important offices such as president or senator in large states, the networks actively compete to see which will be the first to predict a winner. Winner predictions are made by using exit polls, wherein a random sample of voters who exit the polling booth is asked for whom they voted. From the data the sample proportion of voters supporting the candidates is computed.
Statistical inference is the process of making an estimate, prediction, or decision about a population based on sample data. Because populations are almost always very large, investigating each member of the population would be impractical and expensive. It is far easier and cheaper to take a sample from the population of interest and draw conclusions or make estimates about the population on the basis of information provided by the sample. However, such conclusions and estimates are not always going to be correct. For this reason, we build into the statistical inference a measure of reliability. There are two such measures, the confidence level and the significance level. The confidence level is the proportion of times that an estimating procedure will be correct. When the purpose of the statistical inference is to draw a conclusion about a population, the significance level measures how frequently the conclusion will be wrong in the long run.
Key Statistical Concepts
Next Page
Statistical Inference
Statistical inference is the process of making an estimate, prediction, or decision about a population based on a sample.
What can we infer about a Population’s Parameters based on a Sample’s Statistics? Next Page
Statistical Inference
Since statistical inference involves using statistics to make inferences about parameters, we can make an estimate, prediction, or decision about a population based on sample data. We can apply what we know about a sample to the larger population from which it was drawn! The rationale is large populations make investigating each member impractical and expensive. It is easier and cheaper to take a sample and make estimates about the population from the sample. However, such conclusions and estimates are not always going to be correct. For this reason, we build into the statistical inference “measures of reliability,” such as the confidence level and the significance level. The confidence level is the proportion of times that an estimating procedure will be correct. A confidence level of 95% means that estimates based on this form of statistical inference will be correct 95% of the time. When the purpose of the statistical inference is to draw a conclusion about a population, the significance level measures how frequently the conclusion will be wrong in the long run. A 5% significance level means that, in the long run, this type of conclusion will be wrong 5% of the time. Next Page
Confidence and Significance Levels
If we use αα (Greek letter "alpha") to represent significance, then our confidence level is 1−α1−α. This relationship can also be stated as confidence level plus significance level, which is equivalent to one: