Scatter Plot - Introduction to Statistics - Exam, Exams of Statistics

This is the Past Exam of Introduction to Statistics which includes Variables, Description, Different Houses O, Quantitative, Square Footage, Monthly Gas Bill, Monthly Electric Bill, Heights, Histogram etc. Key important points are: Scatter Plot, Median, Mode, Standard Deviation, Percentile, Histogram, Pareto Chart, Box Plot, Confidence Interval, Difference In Means

Typology: Exams

2012/2013

Uploaded on 02/26/2013

aparna
aparna 🇮🇳

4.2

(10)

112 documents

1 / 12

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
MAT 167: Statistics
Final Exam
Instructor: Anthony Tanbakuchi
Spring 2009
Name:
Computer / Seat Number:
No books, notes, or friends. Show your work. You may use the attached
equation sheet, R, and a calculator. No other materials. If you choose to use R,
write what you typed on the test. Using any other program or having any other
documents open on the computer will constitute cheating.
You have until the end of class to finish the exam, manage your time wisely.
If something is unclear quietly come up and ask me.
If the question is legitimate I will inform the whole class.
Express all final answers to 3 significant digits. Probabilities should be given as a
decimal number unless a percent is requested. Circle final answers, ambiguous or
multiple answers will not be accepted. Show steps where appropriate.
The exam consists of 24 questions for a total of 71 points on 9 pages.
This Exam is being given under the guidelines of our institution’s
Code of Academic Ethics. You are expected to respect those guidelines.
Points Earned: out of 71 total points
Exam Score:
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Scatter Plot - Introduction to Statistics - Exam and more Exams Statistics in PDF only on Docsity!

MAT 167: Statistics

Final Exam

Instructor: Anthony Tanbakuchi

Spring 2009

Name:

Computer / Seat Number:

No books, notes, or friends. Show your work. You may use the attached

equation sheet, R, and a calculator. No other materials. If you choose to use R,

write what you typed on the test. Using any other program or having any other

documents open on the computer will constitute cheating.

You have until the end of class to finish the exam, manage your time wisely.

If something is unclear quietly come up and ask me.

If the question is legitimate I will inform the whole class.

Express all final answers to 3 significant digits. Probabilities should be given as a

decimal number unless a percent is requested. Circle final answers, ambiguous or

multiple answers will not be accepted. Show steps where appropriate.

The exam consists of 24 questions for a total of 71 points on 9 pages.

This Exam is being given under the guidelines of our institution’s

Code of Academic Ethics. You are expected to respect those guidelines.

Points Earned: out of 71 total points

Exam Score:

1. The following is a partial list of statistical methods that we have discussed:

1. mean

2. median

3. mode

4. standard deviation

5. z-score

6. percentile

7. coefficient of variation

8. scatter plot

9. histogram

10. pareto chart

11. box plot

12. normal-quantile plot

13. confidence interval for a mean

14. confidence interval for difference in means

15. confidence interval for a proportion

16. confidence interval for difference in pro-

portions

17. one sample mean test

18. two independent sample mean test

19. match pair test

20. one sample proportion test

21. two sample proportion test

22. test of homogeneity

23. test of independence

24. linear correlation coefficient & test

25. regression

26. 1-way ANOVA

For each situation below, which method is most applicable?

(a) (1 point) A researcher would like to estimate the mean weight of javalina.

(b) (1 point) A researcher wants to determine if bear weights are normally distributed.

(c) (1 point) An education researcher wants to determine if the probability a student will

graduate from middle school is effected by their economic status (poor, lower middle class,

middle class,... ).

(d) (1 point) A farmer wants to determine if the mean crop yield is the same for eight different

brands of fertilizer.

(e) (1 point) A fertility researcher wants to determine if a new drug can decrease the pro-

portion of infertile mice. Twenty mice are randomly divided into two groups, a treatment

group and a control group.

2. (1 point) What test is a many sample generalization of the two sample t-test?

3. (1 point) If the mean, median, and mode for a data set are different, what can you conclude

about the data’s distribution?

13. Provide short succinct written answers to the following conceptual questions.

(a) (1 point) Give an example of a categorical type of variable.

(b) (1 point) Which of the following measures of variation is least susceptible to outliers:

standard deviation, inter-quartile range, range

(c) (1 point) What percent of data is greater than Q 3?

(d) (1 point) What does the standard deviation represent conceptually in words? (Be concise

but don’t simply state the equation in words verbatim.)

(e) (1 point) Why would a SAT percentile be preferred over a raw SAT score for college

admissions committees?

14. (2 points) Car tires must not deform or explode when inflated up to their maximum pressure

rating. Before distributing the tires, they must be tested. To test the safety of tires, an

inspector randomly samples 50 tires (without replacement) from a batch of 5,000 that have

been manufactured. The inspector inflates each of the fifty tires until they explode or deform

to make sure they meet the minimum safety requirements. If none of the sampled tires fails

the test, the tires will be distributed to dealers. If the batch contains 15 defective tires that

will explode if selected, what is the probability that the batch will be rejected?

15. (2 points) If a class consists of 20 males and 8 females, what is the probability of drawing 4

females without replacement?

16. (2 points) You would like to conduct a study to estimate (at the 95% confidence level) the

proportion of households that own one or more encyclopedias. What sample size do you need

to estimate the proportion with a margin of error of 2%.

17. The following questions regard hypothesis testing in general.

(a) (1 point) When we conduct a hypothesis test, we assume something is true and calculate

the probability of observing the sample data under this assumption. What do we assume

is true?

(b) (1 point) If you reject H

but H

is true, what type of error has occurred? (Type I or

Type II)

(c) (1 point) What variable represents the actual Type I error?

(d) (1 point) What does the power of a hypothesis test represent?

18. Eighteen students were randomly selected to take the SAT after having either no breakfast or

a complete breakfast A researcher would like to test the claim that students who eat breakfast

score higher than students who do not.

Group without breakfast: SAT Score 480 510 530 540 550 560 600 620 660

Group with breakfast: SAT Score 460 500 530 520 580 580 560 640 690

l

ll

150 AD 1850 BC 4000 BC

epoch

head breadth (cm)

(a) (1 point) What type of hypothesis test (of those discussed in class) should you use?

(b) (1 point) What is the alternative hypothesis for this test?

(c) (1 point) What alpha will you use?

(d) (1 point) What is the response variable for this study?

(e) (1 point) What is the factor variable for this study?

(f) (1 point) The analysis of the data was run and the output is shown below: What is your

Df Sum Sq Mean Sq F value Pr(>F)

epoch 2 138.74 69.37 4.05 0.

Residuals 24 411.11 17.

final conclusion (not the formal decision)?

(g) (1 point) Assuming the researcher rejected the null hypothesis, what is the probability of

a Type I error for this study?

20. The following table lists the the fuel consumption (in miles/gallon) and weight (in lbs) of a

vehicle.

Weight 3180 3450 3225 3985 2440 2500 2290

MPG 27 29 27 24 37 34 37

(a) (2 points) Upon looking at the scatter plot of the data, the relationship of fuel consumption

and milage looks linear. Is the linear relationship statistically significant? (Justify your

answer with an analysis.)

(b) (1 point) What percent of a vehicle’s fuel consumption can be explained by its weight?

(c) (2 points) You are designing a new vehicle and would like to be able to predict its fuel

consumption. Write the equation for fitted model (with the actual values of the coeffi-

cients).

(d) (1 point) What range of vehicle weights is the model valid for making predictions of fuel

efficiency?

(e) (1 point) What is the best predicted fuel consumption for a new vehicle that weights 3200

lbs?

(f) (1 point) If the liner relationship had not been statistically significant, what is the best

predicted fuel consumption for a new vehicle that weights 3200 lbs?

23. Engineers must consider the breadths of male heads when designing motorcycle helmets. Men

have head breadths that are normally distributed with a mean of 6.0 in and a standard deviation

of 1.0 in (based on anthropometric survey data from Gordon, Churchill, et al.).

(a) (2 points) If 1 man is randomly selected, find the probability that his head breadth is

greater than 6.1 in.

(b) (2 points) If 100 men are randomly selected, find the probability that their mean head

breadth is greater than 6.1 in.

24. (2 points) Given y = {a, − 2 a, 4 a}, where a is a constant, completely simplify the following

expression:

y

i

End of exam. Reference sheets follow.

Statistics Quick Reference Card & R Commands by Anthony Tanbakuchi. Version 1.8. http://www.tanbakuchi.com ANTHONY@TANBAKUCHI·COM Get R at: http://www.r-project.org R commands: bold typewriter text

1 Misc R To make a vector / store data: x=c(x1, x2, ...) Help: general RSiteSearch("Search Phrase") Help: function ?functionName Get column of data from table: tableName$columnName List all variables: ls() Delete all variables: rm(list=ls())

√ x = sqrt(x) (1) xn^ = xn (2) n = length(x) (3) T = table(x) (4)

2 Descriptive Statistics

2.1 NUMERICAL Let x=c(x1, x2, x3, ...)

total =

n

i= 1

xi = sum(x) (5)

min = min(x) (6) max = max(x) (7) six number summary : summary(x) (8)

μ = ∑ xi N = mean(x) (9)

x¯ = ∑ xi n = mean(x) (10)

x˜ = P 50 = median(x) (11)

σ =

∑ (xi − μ)^2 N

s =

∑ (xi − ¯x) 2 n − 1 = sd(x) (13)

CV =

σ μ

s x ¯

2.2 RELATIVE STANDING

z = x − μ σ

x − x¯ s

Percentiles: Pk = xi, (sorted x)

k = i − 0. 5 n

To find xi given Pk, i is:

  1. L = (k/100%)n
  2. if L is an integer: i = L + 0 .5; otherwise i=L and round up.
2.3 VISUAL

All plots have optional arguments:

  • main="" sets title
  • xlab="", ylab="" sets x/y-axis label
  • type="p" for p oint plot
  • type="l" for l ine plot
  • type="b" for b oth points and lines Ex: plot(x, y, type="b", main="My Plot") Plot Types: hist(x) histogram stem(x) stem & leaf boxplot(x) box plot plot(T) bar plot, T=table(x) plot(x,y) scatter plot, x, y are ordered vectors plot(t,y) time series plot, t, y are ordered vectors curve(expr, xmin,xmax) plot expr involving x
2.4 ASSESSING NORMALITY

Q-Q plot: qqnorm(x); qqline(x)

3 Probability Number of successes x with n possible outcomes. (Don’t double count!)

P(A) =

xA n

P( A¯) = 1 − P(A) (18)

P(A or B) = P(A) + P(B) − P(A and B) (19) P(A or B) = P(A) + P(B) if A, B mut. excl. (20) P(A and B) = P(A) · P(B|A) (21) P(A and B) = P(A) · P(B) if A, B independent (22) n! = n(n − 1 ) · · · 1 = factorial(n) (23)

nPk =

n! (n − k)! Perm. no elem. alike (24)

n! n 1 !n 2! · · · nk!

Perm. n 1 alike,... (25)

nCk =^

n! (n − k)!k! = choose(n,k) (26)

4 Discrete Random Variables

P(xi) : probability distribution (27)

E = μ = ∑xi · P(xi) (28)

σ =

∑(xi − μ)

2 · P(xi) (29)

4.1 BINOMIAL DISTRIBUTION

μ = n · p (30) σ =

n · p · q (31)

P(x) = nCx p x q (n−x) = dbinom(x, n, p) (32)

4.2 POISSON DISTRIBUTION

P(x) =

μx^ · e−μ x! = dpois(x, μ ) (33)

5 Continuous random variables CDF F(x) gives area to the left of x, F−^1 (p) expects p is area to the left.

f (x) : probability density (34)

E = μ =

Z (^) ∞

−∞ x · f (x) dx (35)

σ =

√Z ∞

−∞

(x − μ) 2 · f (x) dx (36)

F(x) : cumulative prob. density (CDF) (37) F−^1 (x) : inv. cumulative prob. density (38)

F(x) =

Z (^) x

−∞ f (x ′ ) dx ′ (39)

p = P(x < x′) = F(x′) (40)

x ′ = F − 1 (p) (41) p = P(x > a) = 1 − F(a) (42) p = P(a < x < b) = F(b) − F(a) (43)

5.1 UNIFORM DISTRIBUTION

p = P(u < u ′ ) = F(u ′ ) = punif(u’, min=0, max=1) (44)

u ′ = F − 1 (p) = qunif(p, min=0, max=1) (45)

5.2 NORMAL DISTRIBUTION

f (x) =

2 πσ^2

· e − (^12) (x−μ) 2 σ (^2) (46)

p = P(z < z ′ ) = F(z ′ ) = pnorm(z’) (47) z′^ = F−^1 (p) = qnorm(p) (48) p = P(x < x ′ ) = F(x ′ ) = pnorm(x’, mean= μ , sd= σ ) (49) x′^ = F−^1 (p) = qnorm(p, mean= μ , sd= σ ) (50)

5.3 t-DISTRIBUTION

p = P(t < t′) = F(t′) = pt(t’, df) (51)

t ′ = F − 1 (p) = qt(p, df) (52)

5.4 χ^2 - DISTRIBUTION

p = P(χ 2 < χ 2 ′ ) = F(χ 2 ′ ) = pchisq( X^2 ’, df) (53)

χ 2 ′^ = F − 1 (p) = qchisq(p, df) (54)

5.5 F - DISTRIBUTION

p = P(F < F′) = F(F′) = pf(F’, df1, df2) (55) F ′ = F − 1 (p) = qf(p, df1, df2) (56)

6 Sampling distributions

μ¯x = μ σ¯x = σ √ n

μ ˆp = p σ ˆp =

pq n

7 Estimation 7.1 CONFIDENCE INTERVALS

proportion: ˆp ± E, E = zα/ 2 · σ (^) pˆ (59) mean (σ known): ¯x ± E, E = zα/ 2 · σx¯ (60)

mean (σ unknown, use s): ¯x ± E, E = tα/ 2 · σ¯x, (61) d f = n − 1

variance: (n − 1 )s 2

χ^2 R

< σ^2 < (n − 1 )s 2

χ^2 L

d f = n − 1

2 proportions: ∆ ˆp ± zα/ 2 ·

pˆ 1 qˆ 1 n 1

pˆ 2 qˆ 2 n 2

2 means (indep): ∆ x¯ ± tα/ 2 ·

s^2 1 n 1

s^2 2 n 2

d f ≈ min (n 1 − 1 , n 2 − 1 )

matched pairs: d¯ ± tα/ 2 · sd √ n

, di = xi − yi, (65)

d f = n − 1

7.2 CI CRITICAL VALUES (TWO SIDED)

zα/ 2 = F − 1 z (^1 −^ α/^2 ) =^ qnorm(1-alpha/2)^ (66) tα/ 2 = F− t 1 ( 1 − α/ 2 ) = qt(1-alpha/2, df) (67) χ^2 L = F−^1 χ^2 (α/ 2 ) = qchisq(alpha/2, df) (68)

χ 2 R =^ F

− 1 χ^2 (^1 −^ α/^2 ) =^ qchisq(1-alpha/2, df) (69)

7.3 REQUIRED SAMPLE SIZE

proportion: n = pˆ qˆ

( (^) z α/ 2 E

( ˆp = qˆ = 0 .5 if unknown)

mean: n =

zα/ 2 · ˆσ E