Regression Analysis and Hypothesis Testing: A Spring 2009 ISYE 2028 Practicum 3 Assignment, Exams of Data Analysis & Statistical Methods

A spring 2009 isye 2028 practicum 3 assignment for dr. Kobi abayomi's class. The assignment covers topics such as regression analysis, correlation, coefficient of determination, residuals, and hypothesis testing. Students are required to generate observed values, compute correlation and coefficient of determination, and test hypotheses about the relationship between iq and shoe size.

Typology: Exams

Pre 2010

Uploaded on 08/04/2009

koofers-user-x96
koofers-user-x96 🇺🇸

10 documents

1 / 10

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
ISYE 2028 A and B
Spring 2009
Practicum 3
Dr. Kobi Abayomi
April 6, 2009
Please be able to show all of own your work and reasoning. Include computer printouts
where you can. You won’t need Good Luck. Due Tuesday April 14th - In class
1
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Regression Analysis and Hypothesis Testing: A Spring 2009 ISYE 2028 Practicum 3 Assignment and more Exams Data Analysis & Statistical Methods in PDF only on Docsity!

ISYE 2028 A and B

Spring 2009

Practicum 3

Dr. Kobi Abayomi

April 6, 2009

Please be able to show all of own your work and reasoning. Include computer printouts where you can. You won’t need Good Luck. Due Tuesday April 14th - In class

1 Regression

1.1 Residuals

The following is data on shoe size and IQ

ID IQ ShoeSize ei 1 95 6. 2 87 -4. 3 114 -5. 4 116 -3. 5 87 2. 6 134 9. 7 121 1. 8 95 -4. 9 100 -0.

Treat shoe size as the independent variable and IQ as the dependent variable. Two points on the fitted regression line are (0, 20 .7) and (1, 28 .9)

Generate the observed values for the independent variable.

Compute the correlation between IQ and shoe size r = ˆρ.

Compute the coefficient of determination R^2.

Compute ˆσ^2.

1.2 T/F and Short Answer

Answer these True or False

  1. SSE = 0 implies a vertical regression line.
  2. An estimated regression line is y = 5 + 5x. A researcher adds the observation (25, 80). The observation is likely to be influential.
  3. The coefficient of determination is decreases when the Total Sum of Squares increases.

Express the slope between x and y in a linear regression model if sx = 5sy, in terms of the correlation.

3 Farmer Njoroge’s Market

Njoroge the farmer has a lbs of apples and b lbs of potatoes for sale. The market price, in Kisumu Kenya, for apples each day is a random variable with a mean of μx dollars and a standard deviation of σx dollars. Similarly, for a pound of potatoes, the mean price is μy dollars and the standard deviation is σy dollars. Assume that the market prices for potatoes and apples has a correlation of ρ. It costs Njoroge d dollars to bring all the apples and potatoes to market. Assume Njoroge will sell all of each type of produce each day.

a) Define random variables and use them to express Njoroge’s net income, in terms of the constants a, b, d and the parameters μx, μy, σx, σy, ρ.

b) Find the mean and the variance of the net income, also in terms of the given constants and parameters.

c) Njoroge is a popular name in Kenya. Say, for any Njoroge selling potatoes and apples at the market the correlation, ρ is distributed uniform between .5 and .75. Generate 100 random Njoroges – here ρ is a random quantity — with a = 5, b = 10 and 100 more random Njoroges with a = − 5 , b = 10. Set the means and variances to 0 and 1, say. Generate appropriate plots. Talk about what you see.

d) Say 100 Farmer Njoroge’s go to sell their wares at the market in Nairobi and their average net income is 135 dollars. When is it reasonable to conclude, at 95 percent confidence, that this market is more profitable than that in Kisumu? What is a 95 percent confidence interval for the net income for a Njoroge? What would be a reasonable conclusion about the difference in market income between Nairobi and Kisumu? Write your answer in terms of the parameters.

Hint: set μx, μy, etc. equal to numbers to see how it works out. Then express your final answer, generally, in symbols.

4 Linear Models

Ganesh, in one of his many professions, bestows upon his followers 20 sacred seeds per week. As a spiritual being he is very concerned about his sacred seeds growing in the best conditions. He is to choose among his disciples (Harold, Kumar, or Consuelo) to grow his sacred seeds into sacred fruit. To test them, he provides records the number of sacred fruit each disciple is able to grow per week for seven weeks. Using the data below, help Ganesh decide which (or whether) to choose.

Week Harold Kumar Consuelo 1 12 15 16 2 12 20 7 3 12 16 8 4 15 9 6 5 12 7 15 6 12 4 20 7 12 11 9

Claim: A man is like a horse (proportionally). To test this the Horseman club collects data on the amount of kg of fertilizer a man can carry vs. the amount a horse can carry. Denote relevant variables, and write out an appropriate testing procedure fully (with test statistic, hypothesis, assumptions, etc.) for the Horseman club. You are their statistician. Here is your data. It may help you to know that the average weight of a man is 72kg; the average weight of a horse is 360kg.

Obsv Horse Man 1 618 102 2 712 120 3 590 100 4 615 97 5 670 113 6 650 114 7 700 105

values are indications of deviations away from a null hypothesis that people are neither good nor bad, but tabulae rasae.

Construct an appropriate set of hypothesis tests for Dr. Collins, and Mr. Anderthal, given their prior beliefs. For both, specify what are type I, and type II errors. What do α and β denote, in the context of the hypothesis tests you specify for each. What is the Power, again in the context of the hypotheses tests you suggest.

6 Shorter Answers

Let X denote the mean of a random sample of size 25 from a gamma-type distribution with α = 4 and β > 0. Use the central limit theorem to find an approximate .95 confidence interval for μ, the mean of the gamma distribution.

Consider the next 1000 95 percent Confidence Intervals for μ that a statistical consultant will obtain for various clients. Suppose the data sets on which the intervals are based are selected independently of one another. How many of these 1000 intervals do you expect to capture the corresponding value of μ? What is the probability that between 940 and 960 of these intervals contain the corresponding value of μ.

Two different companies have applied to provide cable television service in a certain region. Let p denote the proportion of all potential subscribers who favor the first company over the second. Consider testing Ho : p = .5 vs. Ha : p > .5 based on a random sample of 25 individuals. Let X denote the number in the sample who favor the first company and x represent the observed value of X.

Describe what type I and type II errors are in this context. Compute α. Compute the probability of a type II error here, if the rejection region is R = {x : x ≥ 16 }.

Let μ denote the mean reaction time to a certain stimulus. For a test of Ho : μ = μo vs. Ha : μ 6 = μo where σ^2 is the variance of this stimulus.

Find the p − value associated with each given, observed, z − statistic:

i) zo = 1. 42

ii) zo =. 9

7 More Hypothesis Testing

A survey of 1024 families with 5 children yielded the below distribution. These families are clients of a fertility clinic that guarantees female births 3 times as often as male births. Is the below result consistent with this claim?

Number Girls/Boys 5/0 4/1 3/2 2/3 1/4 0/5 Total Number Families 58 180 352 320 100 14 1024

A group of students in a Statistics class were assigned letter grades by a computer program. The results were tabulated as follows

Grade A B C D F Total Obsvd. 15 25 32 17 11 100

The makers of the computer program claim that the distribution of grades can be fixed a priori. A professor sets the distribution of grades as follows: A - 3/10, B - 4/10, C - 2/10, D .5/10, F - .5/10.

Test if this program is ”working” to the manufacturer’s specifications at the .05 significance level.