Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

STA 199 MIDTERM 1 VERSION B/STA 199 MIDTERM 1 VERSION B, Exams of Statistics

Duke University Statistics

Prepare for STA 199 Midterm 1 Version B with this comprehensive study resource. Includes practice questions, accurate answers, and detailed explanations covering statistical methods, probability concepts, data analysis, distributions, and introductory inference techniques. Designed for undergraduate students to strengthen understanding, improve problem-solving skills, and excel in exams. Perfect for revision, self-assessment, and mastering key concepts in introductory statistics and data science.

Typology: Exams

2025/2026

Available from 03/30/2026

brainfuel-brainfuel 🇺🇸

617 documents

1 / 20

This page cannot be seen from the preview

Don't miss anything!

Section 1: Foundations of Data & Experimental Design (Questions 1-15)

1. A researcher wants to study the average screen time of all college students

in California. She randomly selects 1,500 students from 10 different California

universities and records their daily screen time. What is the population in this

study?

A) The 1,500 selected students

B) Daily screen time

C) All college students in California

D) The 10 universities

Rationale: The population is the entire group of interest, which is all college

students in California. The sample is the 1,500 selected students.

2. A study finds a strong positive correlation between ice cream sales and the

number of drownings. This is an example of:

A) A causal relationship

B) A confounding variable (temperature)

C) A well-designed experiment

D) A negative association

Rationale: This is a classic example of a confounding variable (lurking variable).

Hot weather causes both ice cream sales and swimming/drownings to increase,

creating a correlation without causation.

3. To determine if a new drug lowers blood pressure, researchers randomly

assign 200 patients to receive either the new drug or a placebo. Neither the

patients nor the doctors measuring the blood pressure know which treatment

Discover Exams of Statistics Duke University

Partial preview of the text

Download STA 199 MIDTERM 1 VERSION B/STA 199 MIDTERM 1 VERSION B and more Exams Statistics in PDF only on Docsity!

Section 1: Foundations of Data & Experimental Design (Questions 1-15)

1. A researcher wants to study the average screen time of all college students

in California. She randomly selects 1,500 students from 10 different California

universities and records their daily screen time. What is the population in this

study?

A) The 1,500 selected students

B) Daily screen time

C) All college students in California

D) The 10 universities

Rationale: The population is the entire group of interest, which is all college

students in California. The sample is the 1,500 selected students.

2. A study finds a strong positive correlation between ice cream sales and the

number of drownings. This is an example of:

A) A causal relationship

B) A confounding variable (temperature)

C) A well-designed experiment

D) A negative association

Rationale: This is a classic example of a confounding variable (lurking variable).

Hot weather causes both ice cream sales and swimming/drownings to increase,

creating a correlation without causation.

3. To determine if a new drug lowers blood pressure, researchers randomly

assign 200 patients to receive either the new drug or a placebo. Neither the

patients nor the doctors measuring the blood pressure know which treatment

was given. This is best described as a:

A) Observational study

B) Randomized, double-blind, controlled experiment

C) Block design with matching

D) Retrospective case-control study

Rationale: The key features are random assignment (experiment), both subjects

and evaluators blinded (double-blind), and a comparison group (controlled).

4. Which of the following variables is continuous?

A) Number of siblings

B) Time to run 100 meters (in seconds)

C) Zip code

D) Final exam grade (A, B, C, D, F)

Rationale: Continuous variables can take any value within a range (e.g., 12.

seconds). Number of siblings is discrete; zip code is nominal; letter grades are

ordinal.

5. A survey asks respondents to rate their satisfaction with a product on a

scale from 1 (Very Unsatisfied) to 5 (Very Satisfied). This is an example of

what type of data?

A) Nominal

B) Ordinal

C) Interval

D) Ratio

Rationale: Ordinal data have a meaningful order (1 < 2 < 3), but the difference

between values is not necessarily uniform or meaningful in a mathematical sense.

6. A university wants to ensure its student survey represents the proportion of

freshmen, sophomores, juniors, and seniors. They randomly select 100

students from each class year. This sampling method is:

A) Simple random sample

B) Cluster sample

C) Stratified random sample

D) Convenience sample

Rationale: The population is divided into strata (class years), and random samples

are taken from each stratum.

C) The researcher's bias in measuring outcomes

D) A type of sampling error

Rationale: The placebo effect is a real, measurable physiological or psychological

response to an inert substance or procedure.

1 2. A researcher is studying the effect of sleep deprivation on test scores. The

amount of sleep a participant gets is the:

A) Response variable

B) Confounding variable

C) Explanatory variable

D) Lurking variable

Rationale: The explanatory variable (independent variable) is the one that is

manipulated or used to explain changes in the response variable (dependent

variable).

1 3. Which of the following is a statistic?

A) The true average height of all women in the world (μ)

B) The average height of 50 women randomly selected from New York (x>)

C) The population proportion (p)

D) A fixed, unknown value

Rationale: A statistic is a numerical summary calculated from a sample.

Population parameters (μ, p) are fixed but usually unknown.

1 4. If a sample is biased, it means:

A) The sample size is too small.

B) The sample was not selected randomly.

C) The sample does not accurately represent the target population.

D) The sample has a large standard deviation.

Rationale: Bias is systematic error in sampling that leads to a sample that is not

representative of the population from which it was drawn.

1 5. A study follows a group of 1,000 healthy adults over 20 years to see who

develops heart disease and how their lifestyle factors relate to it. This is a:

A) Cross-sectional study

B) Prospective cohort study

C) Retrospective case-control study

D) Randomized experiment

Rationale: A prospective study follows a group (cohort) forward in time to

observe outcomes.

Section 2: Descriptive Statistics & Data Visualization (Questions 16-35)

1 6. Which measure of center is most affected by outliers?

A) Median

B) Mode

C) Mean

D) Interquartile Range

Rationale: The mean uses all values, so extreme outliers can pull it in their

direction. The median is resistant to outliers.

1 7. For a skewed right distribution, which relationship between the mean and

median is most likely?

A) Mean > Median

B) Mean < Median

C) Mean = Median

D) Cannot be determined

Rationale: In a right-skewed distribution, the tail pulls the mean to the right,

making it larger than the median.

1 8. The standard deviation is best described as:

A) The average of the data points

B) The middle value of the data

C) The typical distance of data points from the mean

D) The range of the middle 50% of the data

Rationale: The standard deviation is a measure of spread that quantifies how much

individual data points deviate from the mean on average.

1 9. The five-number summary includes:

A) Mean, Median, Mode, Range, Standard Deviation

B) Minimum, Q1, Median, Q3, Maximum

C) Mean, Standard Deviation, Variance, Range

D) Z-scores, Percentiles, Quartiles

C) 625

D) Cannot be determined

Rationale: The standard deviation is the square root of the variance. √25 = 5.

2 5. A side-by-side box plot is best used to:

A) Show the distribution of one categorical variable

B) Compare the distribution of a quantitative variable across multiple groups

C) Show the relationship between two quantitative variables

D) Display the frequency of individual data points

Rationale: Side-by-side box plots allow for visual comparison of center, spread,

and shape across different categories.

2 6. A scatterplot is used to examine the relationship between:

A) Two categorical variables

B) Two quantitative variables

C) One quantitative and one categorical variable

D) A variable and itself over time

Rationale: Scatterplots are the standard graphical tool for visualizing association

between two continuous variables.

2 7. The correlation coefficient (r) measures:

A) The slope of the regression line

B) The strength and direction of a linear relationship

C) The percentage of variation explained

D) The causality between two variables

Rationale: The correlation coefficient, r, ranges from -1 to +1 and quantifies the

linear association's strength and direction.

2 8. A correlation of r = 0.92 indicates:

A) A weak, negative linear relationship

B) A strong, positive linear relationship

C) A weak, positive linear relationship

D) No linear relationship

Rationale: A correlation close to +1 indicates a strong positive linear relationship.

2 9. Which of the following is a resistant measure of spread?

A) Standard Deviation

B) Variance

C) Range

D) Interquartile Range (IQR)

Rationale: IQR is based on percentiles and is not influenced by extreme values,

unlike range, variance, and standard deviation.

3 0. In a perfectly symmetrical, bell-shaped distribution, approximately what

percentage of data falls within 2 standard deviations of the mean?

A) 68%

B) 95%

C) 99.7%

D) 100%

Rationale: According to the Empirical Rule for normal distributions, about 95% of

data lies within 2 standard deviations of the mean.

3 1. A stem-and-leaf plot has the advantage over a histogram of:

A) Being better for categorical data

B) Preserving the original data values

C) Always showing the exact shape

D) Being easier to create for large datasets

Rationale: Stem-and-leaf plots show the distribution while retaining each

individual data point.

3 2. What is the median of the following dataset: 4, 8, 12, 16, 20?

A) 4

B) 8

C) 12

D) 16

Rationale: The median is the middle value in an ordered list. The dataset has 5

values, so the 3rd value is 12.

3 3. The 90th percentile of a dataset means:

A) 90% of the data is above that value.

B) 90% of the data is below that value.

C) The value is 90% of the mean.

D) The value is 90 standard deviations from the mean.

Rationale: Independence means the occurrence of one event does not affect the

probability of the other; thus, the conditional probability equals the marginal

probability.

3 8. A fair six-sided die is rolled. What is the probability of rolling a number

greater than 4?

A) 1/

B) 1/

C) 1/

D) 2/

Rationale: Numbers greater than 4 are 5 and 6. 2 favorable outcomes out of 6 total

3 9. A card is drawn from a standard 52-card deck. What is the probability it is

a heart or a king?

A) 13/52 + 4/

B) 13/52 + 4/52 - 1/

C) 13/52 * 4/

D) 13/52 + 4/

Rationale: Using the addition rule: P(Heart or King) = P(Heart) + P(King) -

P(Heart and King) = 13/52 + 4/52 - 1/52 = 16/52 = 4/13.

4 0. The probability that it rains today is 0.3. The probability that it rains

tomorrow is 0.4. Assuming independence, what is the probability it rains on

both days?

A) 0.

B) 0.

C) 0.

D) 0.

Rationale: For independent events, P(A and B) = P(A) * P(B) = 0.3 * 0.4 = 0.12.

4 1. A bag contains 5 red marbles and 3 blue marbles. Two marbles are drawn

without replacement. What is the probability that both are red?

A) (5/8)*(5/8)

B) (5/8)*(4/8)

C) (5/8)*(4/7)

D) (5/8)+(4/7)

Rationale: P(1st red) = 5/8. P(2nd red | 1st red) = 4/7. Multiply for "and" in

conditional situations.

4 2. If P(A) = 0.6, P(B) = 0.5, and P(A and B) = 0.3, then events A and B are:

A) Mutually exclusive only

B) Independent only

C) Both mutually exclusive and independent

D) Neither mutually exclusive nor independent

Rationale: Check independence: P(A)P(B) = 0.60.5 = 0.3 = P(A and B), so they

are independent. Since P(A and B) ≠ 0, they are not mutually exclusive.

4 3. A test for a disease has a 95% sensitivity (true positive rate) and a 90%

specificity (true negative rate). If 2% of the population has the disease, what is

the probability that a person actually has the disease given they tested

positive? This requires:

A) Binomial distribution

B) Bayes' Theorem

C) Law of Large Numbers

D) Addition Rule

Rationale: Bayes' Theorem is used to update the probability of an event based on

new evidence (a positive test result).

4 4. The complement of an event A, denoted A^c, has probability:

A) 1 - P(A)

B) P(A) - 1

C) 1 / P(A)

D) 0

Rationale: The complement rule states that P(A) + P(A^c) = 1.

4 5. A probability distribution of a discrete random variable must satisfy:

A) Each probability is between 0 and 1, and they sum to 1.

B) Each probability is between -1 and 1.

C) The sum of probabilities is 0.

D) The probabilities must be all equal.

Rationale: This is a fundamental requirement for a valid probability mass

function.

C) (0.2)^1 * (0.8)^

D) (0.2)^1 * (0.8)^

Rationale: P(X=1) = C(5,1) * (0.2)^1 * (0.8)^4 = 5 * 0.2 * (0.8)^4.

5 1. Which of the following is NOT a property of the normal distribution?

A) It is symmetric about its mean.

B) It is discrete.

C) The mean, median, and mode are equal.

D) It is bell-shaped.

Rationale: The normal distribution is continuous, not discrete.

5 2. The standard normal distribution has a mean of _____ and a standard

deviation of _____.

A) 0, 1

B) 1, 0

C) 0, 0

D) 1, 1

Rationale: Z ~ N(μ=0, σ=1).

5 3. If X ~ N(100, 15), what is the probability that X is greater than 130?

A) P(Z < 2)

B) P(Z > 2)

C) P(Z < -2)

D) 1 - P(Z > 2)

Rationale: z = (130-100)/15 = 2. P(X > 130) = P(Z > 2).

5 4. The 68-95-99.7 rule applies to:

A) Any distribution

B) Normal distributions

C) Skewed distributions

D) Binomial distributions

Rationale: This empirical rule is specific to normal distributions.

5 5. A random variable that can take on any value within an interval is a:

A) Discrete random variable

B) Continuous random variable

C) Binomial random variable

D) Categorical variable

Rationale: Continuous random variables have an uncountable number of possible

values (e.g., time, weight).

Section 4: Sampling Distributions & Inference Foundations (Questions 56-75)

5 6. The sampling distribution of a statistic is:

A) The distribution of values in a single sample.

B) The distribution of the statistic from all possible samples of the same size.

C) The distribution of the population.

D) The distribution of the standard deviation.

Rationale: It's the theoretical distribution of a statistic (like the sample mean)

across repeated sampling.

5 7. According to the Central Limit Theorem (CLT), for a sufficiently large

sample size, the sampling distribution of the sample mean will be

approximately normal regardless of:

A) The sample size

B) The population mean

C) The shape of the population distribution

D) The population standard deviation

Rationale: The CLT's power is that normality of the sampling distribution holds

for non-normal populations if n is large.

8. The mean of the sampling distribution of the sample mean (μ_x> ) is equal

to:

A) σ/√n

B) μ (the population mean)

C) x U

(the sample mean)

D) s (the sample standard deviation)

Rationale: The sample mean is an unbiased estimator of the population mean.

5 9. The standard deviation of the sampling distribution of the sample mean is

called the:

A) Standard deviation

C) p

D) np

Rationale: The sample proportion is an unbiased estimator of the population

proportion p.

6 4. Bias refers to:

A) The variability of an estimator

B) The difference between the expected value of an estimator and the

parameter

C) The standard error of an estimator

D) The sample size

Rationale: Bias = E(estimator) - parameter. An unbiased estimator has bias = 0.

6 5. Which of the following will reduce the margin of error in a confidence

interval?

A) Increasing the confidence level

B) Increasing the sample size

C) Decreasing the sample size

D) Increasing the population standard deviation

Rationale: Margin of error = z* (σ/√n). Increasing n decreases the margin of error.

6 6. A 95% confidence interval for a population mean is (45, 55). This means:

A) 95% of the population data falls between 45 and 55.

B) There is a 95% probability that the true mean is between 45 and 55.

C) In repeated sampling, 95% of such intervals will contain the true

population mean.

D) The sample mean is 50 with 95% certainty.

Rationale: This is the correct frequentist interpretation of a confidence interval.

7. If all else remains the same, increasing the confidence level from 90% to

5% will cause the confidence interval to become:

A) Wider

B) Narrower

C) The same width

D) Impossible to determine

Rationale: A higher confidence level requires a larger critical value (z*), which

increases the margin of error.

6 8. The t-distribution is used for inference about a population mean when:

A) The sample size is large

B) The population standard deviation (σ) is unknown

C) The population is not normal

D) The sample proportion is being estimated

Rationale: The t-distribution accounts for the additional uncertainty when

estimating σ with the sample standard deviation s.

6 9. As the degrees of freedom increase, the t-distribution:

A) Approaches the standard normal distribution

B) Becomes more skewed

C) Becomes more spread out

D) Has a mean that increases

Rationale: The t-distribution has heavier tails than the normal, but as df → ∞, it

converges to N(0,1).

7 0. A Type I error in hypothesis testing is:

A) Failing to reject a false null hypothesis

B) Rejecting a true null hypothesis

C) Rejecting a false null hypothesis

D) Failing to reject a true null hypothesis

Rationale: Type I error = false positive. The probability of a Type I error is α

(significance level).

7 1. The p-value of a hypothesis test is:

A) The probability the null hypothesis is true.

B) The probability of obtaining results as extreme as or more extreme than the

observed results, assuming H0 is true.

C) The probability of making a Type II error.

D) The significance level.

Rationale: The p-value measures the strength of evidence against the null

hypothesis.

7 2. If the p-value is less than the significance level (α), we:

A) Fail to reject H

B) Reject H

B) There is evidence that more than 50% of voters favor the candidate.

C) The sample proportion was exactly 0.55.

D) 95% of voters fall between 52% and 58%.

Rationale: Since the entire interval is above 0.50, the data provide evidence that

the true population proportion is > 0.50.

7 7. A study reports a p-value of 0.03 for a two-tailed test at α = 0.05. The

correct conclusion is:

A) Fail to reject H0; there is not sufficient evidence.

B) Reject H0; there is sufficient evidence.

C) Accept H0; the null hypothesis is true.

D) The result is not statistically significant.

Rationale: Since p-value (0.03) < α (0.05), we reject the null hypothesis.

7 8. Which of the following is an example of a non-sampling error?

A) Using a small sample size

B) A poorly worded survey question that confuses respondents

C) Natural variability in the sample

D) The margin of error

Rationale: Non-sampling errors include measurement error, non-response bias,

and processing errors, not related to the randomness of sampling.

7 9. A researcher fails to reject the null hypothesis when the null hypothesis is

actually false. This is a:

A) Type I error

B) Type II error

C) Correct decision

D) Sampling error

Rationale: Type II error (β) is failing to reject a false null hypothesis.

8 0. Which of the following is the most important factor in determining the

reliability of a study's conclusions?

A) The cost of the study

B) The number of researchers involved

C) The design (e.g., randomization, sample representativeness) and

appropriate sample size

D) The journal in which it was published

Rationale: A study's validity and reliability are determined by its methodological

rigor, including design, sampling, and analysis, not superficial factors.