Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Midterm Exam with Solutions - Regression Analysis | ISQS 5349, Exams of Information Technology

Material Type: Exam; Class: REGRESSION ANALYSIS ; Subject: Infrmtion Sys and Quant Scienc; University: Texas Tech University; Term: Spring 2008;

Typology: Exams

Pre 2010

Uploaded on 03/11/2009

koofers-user-5z4-1
koofers-user-5z4-1 🇺🇸

5

(2)

10 documents

1 / 4

Toggle sidebar

Related documents


Partial preview of the text

Download Midterm Exam with Solutions - Regression Analysis | ISQS 5349 and more Exams Information Technology in PDF only on Docsity!

ISQS 5349 Midterm, SP 08. Open notes, no book, points (out of 100) in parentheses.

  1. In the GPA/SAT example, Y=GPA and X=SAT. Here are the data.

GPA SAT 3.54 1030 2.13 890 3.12 1210 … (many more records) … 2.10 890

1.A.(15) What is assumption (0) in this example? State it specifically and clearly, and define all math terms used in your explanation, all in terms of GPA and SAT.

Note: Do not explain why (0) is false. Do not explain why we assume (0). Just explain what we assume when we assume (0), in the context of GPA and SAT.

Solution: Assumption (0) states that, for every SAT=x, the GPA data are randomly generated from a probability distribution function (pdf) p(y|x). The pdf p(y|x) is the conceptual probability distribution of all possible GPA values that might occur when SAT=x. The essence of the assumption is that the GPAs are “randomly generated” from some pdfs, and that these pdfs may possibly depend on x.

(Note: There is no requirement here that the pdfs have to depend on x: the case where p(y|x) ≡p(y), not depending on x, is allowed as a special case, and violates no assumption).

1.B.(5) The estimated model is. How would the estimated model be

expected to differ if a more accurate measure of “SAT” was available for each person in the sample?

GPA^ n = 1.8 +.0007(SAT)

Solution: We would expect the coefficient of the more accurately measured SAT would be higher than .0007, since measurement error biases the estimated slope coefficient towards zero.

  1. A model is “good” if we can visualize our observable data as having been produced by the model. A model for generic (X,Y) data is

Y = β 0 +β 1 X + ε, where ε~N(0,σ^2 ).

2.A.(15) How does this model “produce data”? Explain clearly.

Solution: The model is a recipe for producing data. Given an X=x value, a Y value is produced by sampling an ε from the pdf N(0,σ^2 ), and adding that ε to β 0 +β 1 x to get the Y. Many data values can be produced this way, simply by sampling additional ε values from the same N(0,σ^2 )

pdf. Simulation can be used to generate such data if you specify putative parameter values (β 0 ,β 1 ,σ^2 ).

2.B.(15) What kind of data does the model produce? Using words and graphs, characterize the nature of the data produced by this model clearly.

Solution: The (X,Y) scatterplot should show a linear trend, with no curvature, with constant vertical variance across the x range, and with no noticeable outliers, skewness, or signs of discreteness in the distributions of Y shown in all vertical slices (one slice for each possible x). For example, from the Excel Toluca Spreadsheet,

3.A.(15) The normality assumption is not absolutely necessary in regression analysis. Give one way in which it is not absolutely necessary.

Solution: I’ll give three ways. (i) The Gauss-Markov theorem states that OLS estimates are still BLUE, even when (4) is violated. (ii) The CLT says that the pdfs of the estimators are approximately normally distributed, even when (4) is violated. So you can use the usual t-tests and confidence intervals and get approximately correct results when the sample size is large. (iii) The MLEs assuming normality result in the OLS estimates. The OLS estimates have good properties, even without assuming normality, since they are BLUE, but also because the LS criterion is meaningful outside of normality (ie, it makes intuitive sense to minimize the sum of squared errors, even without the normality assumption, because the resulting line will pass nearly through the middle of the data.)

3.B.(15) If the normality assumption fails in regression analysis, what can go wrong? Give one example.

Solution: Again, I’ll give three. (i) Normality does not respect bounds. So if you want to use the data to construct a prediction interval for Y using the quick-and-dirty interval Y ˆ^ ± 2 MSE , the interval might easily exceed the known bounds. This happened in the GPA prediction interval, where the interval for Joe had an upper bound greater than 4.0. (ii) When (4) is violated, the inferences (intervals, tests) can only be approximate. While the CLT states that the approximations become better as n increases, the goodness of the approximation also depends on the degree of non-normality of the process that is sampled. So if the process sampled is badly non-normal, the approximations can be very poor. (iii) MLEs are OLS estimates under (0)-(4). While OLS are BLUE even though (4) is violated, recall that BLUE means “best in the class of linear unbiased estimators.” MLEs that assume models other than (0)-(4) are typically nonlinear functions of the data. So, while OLS are best in the class of LUEs, even when (4) is violated, there are other nonlinear estimators that are better.

Comment: In retrospect, I would have rephrased 3.A. as “Give one way in which it is not absolutely necessary for the usual OLS analysis,” and 3.B. as “what can go wrong for the usual OLS analysis.” There are other answers available when you consider non-OLS likelihood-based methods, and I gave credit for these answers.

4.(20) Usually, researchers prefer to “Reject H 0 :β 1 =0” rather than “Fail to reject H 0 :β 1 =0.” Using a specific example of your choosing, explain why the predictions the researcher can make when H 0 :β 1 =0 is rejected are preferred to the predictions the researcher can make when H 0 :β 1 = is not rejected.

Solution: See the class web page “Confidence intervals and significance tests as predictions” (http://courses.ttu.edu/isqs5349-westfall/images/5349/conf_int_sig_test_pred.htm) for specific discussion of how hypothesis tests can be interpreted as predictions.

If a researcher rejects H 0 :β 1 =0, then we know that the confidence interval for β 1 excludes 0. We also know that the interval is centered at the estimated β 1 ; i.e., it is centered at the OLS estimate b 1. Assume b 1 >0 for the sake of argument; the case b 1 <0 can be discussed by changing “greater than” to “less than” in the following discussion, and is excluded for convenience.

Consider the GPA/SAT case. Having rejected H 0 :β 1 =0, the researcher can predict confidently that if a large population of students having SAT=1000 is observed, and if another large population of students having SAT=1100 is observed, then the average GPA for the latter population will be greater than the average GPA for the former population.

(Note: This prediction requires a subjective assessment that the data generating process for these new data is no different from the data generating process for the observed data.)

On the other hand, if the researcher does not reject H 0 :β 1 =0, then the confidence interval for β 1 includes 0. In this case, if a large population of students having SAT=1000 is observed, and if another large population of students having SAT=1100 is observed, then the average GPA for the latter population might be greater than the average GPA for the former population, or it might be less than the average GPA for the former population. In other words, the data are ambiguous with respect to determining the effect of SAT on GPA.

The “reject H 0 :β 1 =0” is preferred, therefore, because an informative prediction can be made. Predicting that one average will be greater than the other is an informative conclusion; the inability to make any such prediction, as occurs when the researcher “fails to reject H 0 :β 1 =0” is not an informative conclusion.