review problems for statistics, Quizzes of Mathematics

practice problems for quizzes and tests for statistics

Typology: Quizzes

2020/2021

Uploaded on 04/25/2023

preya-patel-1
preya-patel-1 🇺🇸

2 documents

1 / 20

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Page 1 of 20
Final Exam Review Problems
Multiple Choice:
1. A large-sample randomized experiment is conducted to test the difference of proportions between two
independent groups. Specifically, consider the one-sided test:
𝐻0: 𝑝1 𝑝2= 0 𝑣. 𝑠. 𝐻𝐴: 𝑝1 𝑝2< 0
The p-value is found to be 0.047. A two-sided 95% confidence interval for the difference is also
calculated and is (-0.02, 0.32). With a significance level of 0.05, what is the correct decision?
o Fail to reject H0, because 0 falls inside (-0.02, 0.32).
o Fail to reject H0, because 0.047 < 0.05.
o Reject H0, because 0.047 < 0.05.
o Cannot make a decision because the confidence interval conflicts with the p-value.
2. Below is a histogram representing the age of death of American males in 2019.
Use the histogram to determine which statement below is correct.
o The mean is between 85 and 90 years old.
o The mean and median are approximately equal.
o The median is less than 60 years old.
o The median is greater than the mean.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14

Partial preview of the text

Download review problems for statistics and more Quizzes Mathematics in PDF only on Docsity!

Final Exam Review Problems

Multiple Choice:

  1. A large-sample randomized experiment is conducted to test the difference of proportions between two independent groups. Specifically, consider the one-sided test: 𝐻 0 : 𝑝 1 − 𝑝 2 = 0 𝑣. 𝑠. 𝐻𝐴: 𝑝 1 − 𝑝 2 < 0 The p-value is found to be 0.047. A two-sided 95% confidence interval for the difference is also calculated and is (-0.02, 0.32). With a significance level of 0.05, what is the correct decision?

o Fail to reject H 0 , because 0 falls inside (-0.02, 0.32).

o Fail to reject H 0 , because 0.047 < 0.05.

o Reject H 0 , because 0.047 < 0.05.

o Cannot make a decision because the confidence interval conflicts with the p-value.

  1. Below is a histogram representing the age of death of American males in 2019. Use the histogram to determine which statement below is correct.

o The mean is between 85 and 9 0 years old.

o The mean and median are approximately equal.

o The median is less than 60 years old.

o The median is greater than the mean.

  1. If the p-value is larger than the significance level, we

o Fail to reject the null hypothesis, because the p-value is the probability of the null hypothesis

being true.

o Fail to reject the null hypothesis, because it is likely that the observed difference occurs by

random chance under the null hypothesis.

o Reject the null hypothesis, because it is unlikely that the observed difference occurs by random

chance under the null hypothesis.

o Reject the null hypothesis, because the p-value is the probability of the alternative hypothesis

being true.

  1. For the SAT college entrance exam, the combined scores ranged from 400 to 1600. A study recorded the combined scores from 100 students from each of three schools in a western state. The resulting scores were used to produce these boxplots. Which of the following is true?

o School 3 is more skewed to the right than School 2.

o School 3 is more symmetric than School 2.

o The median of School 1 is greater than the median of School 2.

o The third quartile of School 1 is greater than 1400.

  1. Justine is interested in showing that a majority of students in ST311 use iPhones over Androids. She randomly selects 100 students and asks them which phone they have. 85 of the students respond that they use an iPhone and 15 respond Android. Define p = the population proportion of iPhone users in ST 311. Which of the following is correct if Justine would like to conduct an appropriate hypothesis test?

o 𝐻 0 : 𝑝 = 0 , 𝐻𝐴: 𝑝 > 0

o 𝐻 0 : 𝑝 = 0. 85 , 𝐻𝐴: 𝑝 ≠ 0. 85

o 𝐻 0 : 𝑝 = 0. 5 , 𝐻𝐴: 𝑝 > 0. 5

o 𝐻 0 : 𝑝 = 0. 85 , 𝐻𝐴: 𝑝 > 0. 85

  1. An investigator would like to test the effects of a new painkiller, and will do so by comparing it to the painkiller that is currently on the market. They suspect that it may work differently on men and women. Within a random sample consisting of male and female patients, they randomly assigned half of the male patients to take the new painkiller and the remaining half to take the current one. The same procedure was used on the female patients. Which of the following best describes the design?

o Completely Randomized Design

o Matched Pairs Design

o Block Design

o Cluster Design

  1. A marketing researcher collected data on shoppers in a grocery store. After each shopper checked out, the researcher noted whether they used a basket or cart, and then rated the items in their shopping cart on a “healthiness score.” The researcher found that on average, people who shopped with baskets bought less healthy food than those who shopped with carts. This difference in “healthiness scores” was statistically significant. What can the researcher conclude? Assume all statistical tests were performed correctly.

o Shopping with a cart causes shoppers to buy more healthy food.

o Shopping with a cart is associated with buying more healthy food.

o Both A and B.

o None of the above.

  1. We are able to perform statistical inference because

o The p-value is significant.

o The sampling distribution is the same as the population distribution.

o The standard deviation is larger in sampling distributions than the population distribution.

o Sample statistics have predictable distributions.

  1. Identify the correct order (smallest to largest) of the correlation coefficients of the following 4 regression models:

o Regression 1 < Regression 2 < Regression 4 < Regression 3

o Regression 3 < Regression 4 < Regression 1 < Regression 2

o Regression 4 < Regression 3 < Regression 2 < Regression 1

o Regression 4 < Regression 3 < Regression 1 < Regression 2

  1. Which of the following is true about the correlation coefficient, r , in linear regression?

o The correlation coefficient will change if we change the units of measure.

o If the correlation coefficient is - 1, then the slope of the regression line is also - 1.

o If the correlation coefficient is close to 0, there is a strong linear relationship between the two

variables.

o If the correlation coefficient is negative, the slope of the regression line will also be negative.

  1. James collected a random sample and was able to obtain a 95% confidence interval: (0.546, 0.798). Which of the following would result in a narrower confidence interval?

o Choosing a smaller confidence level

o Decreasing the sample size

o Estimating the parameter

o None of the above

  1. Below is the histogram of distributions A, B, and C: In which case(s) below would it be valid to use the normal distribution to make probability calculations concerning the sample mean?

o A sample from population A of size n=5.

o A sample from population B of size n=20.

o A sample from population C of size n=100.

o All the above.

  1. Which of the following is true about the correlation coefficient r?

o The correlation coefficient is always greater than 0

o The correlation coefficient will change if we change the units of measure

o The correlation coefficient is always between - 1 and +

o If the correlation coefficient is close to 0, that means there is a strong linear relationship between

the two variables

  1. If you were conducting a two-tail t test with a sample size of n =24, what would the critical t value be if alpha was chosen as 5%?

o 1.

o 2.

o 1.

o 1.

  1. What is a requirement for an Experimental Design?

o Placebo

o Randomization

o Blocking

o Hawthorne Effect

  1. Researchers are interested in comparing the number of cheesecake consumption between NCSU and UNC every year. Assuming all the conditions are satisfied, the researchers found a 95 percent confidence interval for the difference of mean cheesecake consumption to be (− 0. 98 , 2. 31 ). Which one is CORRECT about this confidence interval?

o We can conclude that there is more cheesecake consumption in NCSU than UNC.

o If we randomly pick one student from NCSU and one student from UNC, the student from NCSU

consumes 95% more cheesecake than the student from UNC.

o In repeated sampling, the process used to calculate the interval (− 0. 98 , 2. 31 ) has a 95%

probability of containing the population mean difference in cheesecake consumption.

o There is a 95% probability that the difference in population mean cheesecake consumption

between NCSU and UNC is within (− 0. 98 , 2. 31 ).

  1. If a hypothesis is not rejected at the 0.10 level of significance, it

o Must be rejected at 0.05 level of significance.

o May be rejected at the 0.05 level of significance.

o Will not be rejected at the 0.05 level of significance.

o Must be rejected at the 0.025 level of significance.

  1. The Mad Swami is an aspiring rapper who’s been producing his own songs. Over multiple platforms, his new song is averaging 350 streams with a standard deviation of 35 streams. His previous song averaged 280 streams. Let 𝜇 be the true average streams for the new song. Assume that streams follow a normal distribution. Which set of hypotheses is appropriate to test Mad Swami’s claim that the new song has, on average, more streams than the previous song?

o 𝐻 0 : 𝜇 = 0 , 𝐻𝐴: 𝜇 > 0

o 𝐻 0 : 𝜇 = 350 , 𝐻𝐴: 𝜇 > 350

o 𝐻 0 : 𝜇 = 280 , 𝐻𝐴: 𝜇 > 280

o 𝐻 0 : 𝜇 = 0 , 𝐻𝐴: 𝜇 > 70

  1. Judy wants to predict percent body fat from waist size (inches). She models their relationship by fitting a least squares line to her data: ŷ = - 42.7 + 1.7x. Which of the following is the correct interpretation of the slope 1.7?

o Increasing waist size by 1 inch will lead to an increase in percent body fat by 1.7 on average.

o For every additional percent body fat, we expect waist size to increase by 1.7 inches on average.

o For every additional inch around the waist, we expect percent body fat to increase by 1.7 on

average.

o Increasing percent body fat by 1 will make waist size grow by 1.7 inch on average.

  1. A national achievement test is administered annually to 3rd graders. The test has a mean score of 100 and a standard deviation of 15. If Jane's z-score is 1.20, what was her score on the test?

o 82

o 88

o 100

o 118

  1. The age of best actress award winners from 1928 to 2009 is shown in the following boxplot, which measure of center would be most appropriate to summarize this data?

o Mean

o Median

o Mode

o Mean and Median

  1. A hypothesis test is conducted on two random samples (n = 25 and m = 30) and a two-sample t - test for the difference in population means is conducted. The hypotheses are: H 0 : μ 1 = μ 2 vs. HA: μ 1 > μ 2. The test statistic is 𝑡 = 1. 94. What is the p - value?

o Greater than 0.

o Between 0.05 and 0.

o Between 0.025 and 0.

o Between 0.01 and 0.

32. A researcher wants to know if playing light music affects the concentration of study. The researcher

has two identical rooms available in a library, and the light music is played in one of the classrooms. She then selects a random sample of students in the library and randomly assigns half of them to the music room and the other half to the regular room. After a few hours, they are asked to write down their concentration level during the study. What kind of experimental design is used here?

o Completely randomized

o Matched pairs

o Randomized block

o Stratified sampling

  1. We are finding a confidence interval for a difference of two means, with 𝛼 = 0. 05. For group 1, we sampled 30 individuals. For group 2, we sampled 50 individuals. What are the degrees of freedom, and the corresponding critical value (i.e. confidence coefficient)?

o Df = 78, 𝑡∗^ = 1.

o Df = 49, 𝑡∗^ = 2.

o Df = 29, 𝑡∗^ = 2.

o Df = 30, 𝑡∗^ = 2.

  1. Suppose the ages of ST 311 students are: 20, 24, 17, 18, 21, 19, 25, 20, 26, 19, 43, 20. If we replace age 43 with age 20, which measurement WILL NOT change?

o IQR

o Standard deviation

o Median

o Mean

  1. In a random sample of 115 American adults who did attend college, 45 said they believe in extraterrestrials. In a random sample of 110 American adults who did not attend college, 39 said they believe. Does this indicate that a difference exists between 𝑝 1 (the proportion of people who attended college and believe in extraterrestrials) and 𝑝 2 (the proportion who did not attend college and believe)? Use 𝛼 = 0. 05. For full credit, you must show all work, including stating the hypotheses, checking assumptions, calculation of the test statistic, and P-value. You must also state your decision, and separately, give an interpretation in context of the problem.
  1. A linear regression model describing the relationship between the number of police on a college campus and the number of crimes committed there per year is summarized below. Assume all conditions are satisfied. Simple linear regression results: Dependent Variable: Crimes Independent Variable: Police Crimes = 64.047935 + 1.7270334 Police Sample size: 20 R (correlation coefficient) = 0. R-sq = 0. Estimate of error standard deviation: 19. Parameter estimates: Parameter Estimate Std. Err. Alternative DF T-Stat P-value Intercept 64.047935 10.422051 ≠ 0 18 6.1454254 <0. Slope 1.7270334 0.1953343 ≠ 0 18 -------- ------- a) Interpret the slope of this regression line in the context of the problem. (3 pts) b) The test statistic for the linear relationship is missing. Find this value (show your work, round your answer to 3 decimal places). (3 pts) c) Based on your answer to (b), can we conclude that there is a significant linear relationship between the number of police and the number of crimes committed at the 𝛼 = 5% level? Explain your answer. (5 pts)
  1. The cost for Tamiflu this flu season is normally distributed with a mean of $162 with a standard deviation of $12. a) What is the probability that a person pays more than $174 for Tamiflu this season? A random sample of 18 pharmacies is selected and the price of Tamiflu is recorded. b) Determine the mean of the sampling distribution of the sample mean of the cost of Tamiflu at 18 randomly selected pharmacies. c) Determine the standard deviation of the sampling distribution of the sample mean of the cost of Tamiflu at 18 randomly selected pharmacies. d) Calculate the probability that a random sample of 18 pharmacies has a mean cost of less than $155 for Tamiflu.
  1. It’s become a common goal for a person to take 10,000 steps per day, thanks to the availability of fitness tracking devices in recent years. A study is being done to determine the average number of steps taken per day by people in a particular metro area. Random samples of 40 adults are selected from each of “city”, “suburb”, and “rural” locations. Each participant is given a fitness tracker and told to go about their lives as usual. The number of steps taken on a random day is recorded for each participant. a. What type of sampling technique was used in this study? b. What is the parameter of interest? c. From past studies, we know that in rural locations the mean number of steps was 9228 with a standard deviation of 1771 steps. What is the probability that a random sample of 40 rural residents have a mean number of steps of 10,000 or more?
  1. Data is collected on the amount of money a company spends on Radio advertising (in thousands of dollars) and the Sales (number of units sold, in thousands of units) by the company. A simple linear regression is conducted. The output is given below.

Simple linear regression results:

Sales = 12.235722 + 0.12443166 Radio

Sample size: 200

R (correlation coefficient) = 0.

R-sq = 0.

Estimate of error standard deviation: 4.

Parameter estimates:

Parameter Estimate Std. Err. Alternative DF T-Stat P-value

Intercept 12.235722 0.65348629 ≠ 0 198 18.723762 <0.

Slope 0.12443166 0.023696033 ≠ 0 198 5.

a. Circle one each : There is a ( weak / strong ) and ( positive / negative ) linear correlation. b. Fill in the blank : As the amount of money spent on radio advertising increases , the number of units sold ____________________________. c. Interpret the estimate of the slope in the context of the problem. d. Is there a significant linear relationship? Justify your answer.

e. Data is also collected on the amount of money spent on Newspaper advertising. A linear regression model is fit to this data. Both scatterplots, with the regression lines, are provided below. Which variable (Radio advertising or Newspaper advertising) does a better job predicting Sales? Explain your choice.