Download Scatter Plot - Introduction to Statistics - Exam and more Exams Statistics in PDF only on Docsity!
MAT 167: Statistics
Final Exam
Instructor: Anthony Tanbakuchi
Spring 2009
Name:
Computer / Seat Number:
No books, notes, or friends. Show your work. You may use the attached
equation sheet, R, and a calculator. No other materials. If you choose to use R,
write what you typed on the test. Using any other program or having any other
documents open on the computer will constitute cheating.
You have until the end of class to finish the exam, manage your time wisely.
If something is unclear quietly come up and ask me.
If the question is legitimate I will inform the whole class.
Express all final answers to 3 significant digits. Probabilities should be given as a
decimal number unless a percent is requested. Circle final answers, ambiguous or
multiple answers will not be accepted. Show steps where appropriate.
The exam consists of 24 questions for a total of 71 points on 9 pages.
This Exam is being given under the guidelines of our institution’s
Code of Academic Ethics. You are expected to respect those guidelines.
Points Earned: out of 71 total points
Exam Score:
1. The following is a partial list of statistical methods that we have discussed:
1. mean
2. median
3. mode
4. standard deviation
5. z-score
6. percentile
7. coefficient of variation
8. scatter plot
9. histogram
10. pareto chart
11. box plot
12. normal-quantile plot
13. confidence interval for a mean
14. confidence interval for difference in means
15. confidence interval for a proportion
16. confidence interval for difference in pro-
portions
17. one sample mean test
18. two independent sample mean test
19. match pair test
20. one sample proportion test
21. two sample proportion test
22. test of homogeneity
23. test of independence
24. linear correlation coefficient & test
25. regression
26. 1-way ANOVA
For each situation below, which method is most applicable?
(a) (1 point) A researcher would like to estimate the mean weight of javalina.
(b) (1 point) A researcher wants to determine if bear weights are normally distributed.
(c) (1 point) An education researcher wants to determine if the probability a student will
graduate from middle school is effected by their economic status (poor, lower middle class,
middle class,... ).
(d) (1 point) A farmer wants to determine if the mean crop yield is the same for eight different
brands of fertilizer.
(e) (1 point) A fertility researcher wants to determine if a new drug can decrease the pro-
portion of infertile mice. Twenty mice are randomly divided into two groups, a treatment
group and a control group.
2. (1 point) What test is a many sample generalization of the two sample t-test?
3. (1 point) If the mean, median, and mode for a data set are different, what can you conclude
about the data’s distribution?
13. Provide short succinct written answers to the following conceptual questions.
(a) (1 point) Give an example of a categorical type of variable.
(b) (1 point) Which of the following measures of variation is least susceptible to outliers:
standard deviation, inter-quartile range, range
(c) (1 point) What percent of data is greater than Q 3?
(d) (1 point) What does the standard deviation represent conceptually in words? (Be concise
but don’t simply state the equation in words verbatim.)
(e) (1 point) Why would a SAT percentile be preferred over a raw SAT score for college
admissions committees?
14. (2 points) Car tires must not deform or explode when inflated up to their maximum pressure
rating. Before distributing the tires, they must be tested. To test the safety of tires, an
inspector randomly samples 50 tires (without replacement) from a batch of 5,000 that have
been manufactured. The inspector inflates each of the fifty tires until they explode or deform
to make sure they meet the minimum safety requirements. If none of the sampled tires fails
the test, the tires will be distributed to dealers. If the batch contains 15 defective tires that
will explode if selected, what is the probability that the batch will be rejected?
15. (2 points) If a class consists of 20 males and 8 females, what is the probability of drawing 4
females without replacement?
16. (2 points) You would like to conduct a study to estimate (at the 95% confidence level) the
proportion of households that own one or more encyclopedias. What sample size do you need
to estimate the proportion with a margin of error of 2%.
17. The following questions regard hypothesis testing in general.
(a) (1 point) When we conduct a hypothesis test, we assume something is true and calculate
the probability of observing the sample data under this assumption. What do we assume
is true?
(b) (1 point) If you reject H
but H
is true, what type of error has occurred? (Type I or
Type II)
(c) (1 point) What variable represents the actual Type I error?
(d) (1 point) What does the power of a hypothesis test represent?
18. Eighteen students were randomly selected to take the SAT after having either no breakfast or
a complete breakfast A researcher would like to test the claim that students who eat breakfast
score higher than students who do not.
Group without breakfast: SAT Score 480 510 530 540 550 560 600 620 660
Group with breakfast: SAT Score 460 500 530 520 580 580 560 640 690
l
ll
150 AD 1850 BC 4000 BC
epoch
head breadth (cm)
(a) (1 point) What type of hypothesis test (of those discussed in class) should you use?
(b) (1 point) What is the alternative hypothesis for this test?
(c) (1 point) What alpha will you use?
(d) (1 point) What is the response variable for this study?
(e) (1 point) What is the factor variable for this study?
(f) (1 point) The analysis of the data was run and the output is shown below: What is your
Df Sum Sq Mean Sq F value Pr(>F)
epoch 2 138.74 69.37 4.05 0.
Residuals 24 411.11 17.
final conclusion (not the formal decision)?
(g) (1 point) Assuming the researcher rejected the null hypothesis, what is the probability of
a Type I error for this study?
20. The following table lists the the fuel consumption (in miles/gallon) and weight (in lbs) of a
vehicle.
Weight 3180 3450 3225 3985 2440 2500 2290
MPG 27 29 27 24 37 34 37
(a) (2 points) Upon looking at the scatter plot of the data, the relationship of fuel consumption
and milage looks linear. Is the linear relationship statistically significant? (Justify your
answer with an analysis.)
(b) (1 point) What percent of a vehicle’s fuel consumption can be explained by its weight?
(c) (2 points) You are designing a new vehicle and would like to be able to predict its fuel
consumption. Write the equation for fitted model (with the actual values of the coeffi-
cients).
(d) (1 point) What range of vehicle weights is the model valid for making predictions of fuel
efficiency?
(e) (1 point) What is the best predicted fuel consumption for a new vehicle that weights 3200
lbs?
(f) (1 point) If the liner relationship had not been statistically significant, what is the best
predicted fuel consumption for a new vehicle that weights 3200 lbs?
23. Engineers must consider the breadths of male heads when designing motorcycle helmets. Men
have head breadths that are normally distributed with a mean of 6.0 in and a standard deviation
of 1.0 in (based on anthropometric survey data from Gordon, Churchill, et al.).
(a) (2 points) If 1 man is randomly selected, find the probability that his head breadth is
greater than 6.1 in.
(b) (2 points) If 100 men are randomly selected, find the probability that their mean head
breadth is greater than 6.1 in.
24. (2 points) Given y = {a, − 2 a, 4 a}, where a is a constant, completely simplify the following
expression:
y
i
End of exam. Reference sheets follow.
Statistics Quick Reference Card & R Commands by Anthony Tanbakuchi. Version 1.8. http://www.tanbakuchi.com ANTHONY@TANBAKUCHI·COM Get R at: http://www.r-project.org R commands: bold typewriter text
1 Misc R To make a vector / store data: x=c(x1, x2, ...) Help: general RSiteSearch("Search Phrase") Help: function ?functionName Get column of data from table: tableName$columnName List all variables: ls() Delete all variables: rm(list=ls())
√ x = sqrt(x) (1) xn^ = x ∧ n (2) n = length(x) (3) T = table(x) (4)
2 Descriptive Statistics
2.1 NUMERICAL Let x=c(x1, x2, x3, ...)
total =
n
i= 1
xi = sum(x) (5)
min = min(x) (6) max = max(x) (7) six number summary : summary(x) (8)
μ = ∑ xi N = mean(x) (9)
x¯ = ∑ xi n = mean(x) (10)
x˜ = P 50 = median(x) (11)
σ =
∑ (xi − μ)^2 N
s =
∑ (xi − ¯x) 2 n − 1 = sd(x) (13)
CV =
σ μ
s x ¯
2.2 RELATIVE STANDING
z = x − μ σ
x − x¯ s
Percentiles: Pk = xi, (sorted x)
k = i − 0. 5 n
To find xi given Pk, i is:
- L = (k/100%)n
- if L is an integer: i = L + 0 .5; otherwise i=L and round up.
2.3 VISUAL
All plots have optional arguments:
- main="" sets title
- xlab="", ylab="" sets x/y-axis label
- type="p" for p oint plot
- type="l" for l ine plot
- type="b" for b oth points and lines Ex: plot(x, y, type="b", main="My Plot") Plot Types: hist(x) histogram stem(x) stem & leaf boxplot(x) box plot plot(T) bar plot, T=table(x) plot(x,y) scatter plot, x, y are ordered vectors plot(t,y) time series plot, t, y are ordered vectors curve(expr, xmin,xmax) plot expr involving x
2.4 ASSESSING NORMALITY
Q-Q plot: qqnorm(x); qqline(x)
3 Probability Number of successes x with n possible outcomes. (Don’t double count!)
P(A) =
xA n
P( A¯) = 1 − P(A) (18)
P(A or B) = P(A) + P(B) − P(A and B) (19) P(A or B) = P(A) + P(B) if A, B mut. excl. (20) P(A and B) = P(A) · P(B|A) (21) P(A and B) = P(A) · P(B) if A, B independent (22) n! = n(n − 1 ) · · · 1 = factorial(n) (23)
nPk =
n! (n − k)! Perm. no elem. alike (24)
n! n 1 !n 2! · · · nk!
Perm. n 1 alike,... (25)
nCk =^
n! (n − k)!k! = choose(n,k) (26)
4 Discrete Random Variables
P(xi) : probability distribution (27)
E = μ = ∑xi · P(xi) (28)
σ =
∑(xi − μ)
2 · P(xi) (29)
4.1 BINOMIAL DISTRIBUTION
μ = n · p (30) σ =
n · p · q (31)
P(x) = nCx p x q (n−x) = dbinom(x, n, p) (32)
4.2 POISSON DISTRIBUTION
P(x) =
μx^ · e−μ x! = dpois(x, μ ) (33)
5 Continuous random variables CDF F(x) gives area to the left of x, F−^1 (p) expects p is area to the left.
f (x) : probability density (34)
E = μ =
Z (^) ∞
−∞ x · f (x) dx (35)
σ =
√Z ∞
−∞
(x − μ) 2 · f (x) dx (36)
F(x) : cumulative prob. density (CDF) (37) F−^1 (x) : inv. cumulative prob. density (38)
F(x) =
Z (^) x
−∞ f (x ′ ) dx ′ (39)
p = P(x < x′) = F(x′) (40)
x ′ = F − 1 (p) (41) p = P(x > a) = 1 − F(a) (42) p = P(a < x < b) = F(b) − F(a) (43)
5.1 UNIFORM DISTRIBUTION
p = P(u < u ′ ) = F(u ′ ) = punif(u’, min=0, max=1) (44)
u ′ = F − 1 (p) = qunif(p, min=0, max=1) (45)
5.2 NORMAL DISTRIBUTION
f (x) =
2 πσ^2
· e − (^12) (x−μ) 2 σ (^2) (46)
p = P(z < z ′ ) = F(z ′ ) = pnorm(z’) (47) z′^ = F−^1 (p) = qnorm(p) (48) p = P(x < x ′ ) = F(x ′ ) = pnorm(x’, mean= μ , sd= σ ) (49) x′^ = F−^1 (p) = qnorm(p, mean= μ , sd= σ ) (50)
5.3 t-DISTRIBUTION
p = P(t < t′) = F(t′) = pt(t’, df) (51)
t ′ = F − 1 (p) = qt(p, df) (52)
5.4 χ^2 - DISTRIBUTION
p = P(χ 2 < χ 2 ′ ) = F(χ 2 ′ ) = pchisq( X^2 ’, df) (53)
χ 2 ′^ = F − 1 (p) = qchisq(p, df) (54)
5.5 F - DISTRIBUTION
p = P(F < F′) = F(F′) = pf(F’, df1, df2) (55) F ′ = F − 1 (p) = qf(p, df1, df2) (56)
6 Sampling distributions
μ¯x = μ σ¯x = σ √ n
μ ˆp = p σ ˆp =
pq n
7 Estimation 7.1 CONFIDENCE INTERVALS
proportion: ˆp ± E, E = zα/ 2 · σ (^) pˆ (59) mean (σ known): ¯x ± E, E = zα/ 2 · σx¯ (60)
mean (σ unknown, use s): ¯x ± E, E = tα/ 2 · σ¯x, (61) d f = n − 1
variance: (n − 1 )s 2
χ^2 R
< σ^2 < (n − 1 )s 2
χ^2 L
d f = n − 1
2 proportions: ∆ ˆp ± zα/ 2 ·
pˆ 1 qˆ 1 n 1
pˆ 2 qˆ 2 n 2
2 means (indep): ∆ x¯ ± tα/ 2 ·
s^2 1 n 1
s^2 2 n 2
d f ≈ min (n 1 − 1 , n 2 − 1 )
matched pairs: d¯ ± tα/ 2 · sd √ n
, di = xi − yi, (65)
d f = n − 1
7.2 CI CRITICAL VALUES (TWO SIDED)
zα/ 2 = F − 1 z (^1 −^ α/^2 ) =^ qnorm(1-alpha/2)^ (66) tα/ 2 = F− t 1 ( 1 − α/ 2 ) = qt(1-alpha/2, df) (67) χ^2 L = F−^1 χ^2 (α/ 2 ) = qchisq(alpha/2, df) (68)
χ 2 R =^ F
− 1 χ^2 (^1 −^ α/^2 ) =^ qchisq(1-alpha/2, df) (69)
7.3 REQUIRED SAMPLE SIZE
proportion: n = pˆ qˆ
( (^) z α/ 2 E
( ˆp = qˆ = 0 .5 if unknown)
mean: n =
zα/ 2 · ˆσ E