Download Applied Regression Analysis - Homework 2 Solutions | STAT 4230 and more Assignments Statistics in PDF only on Docsity! i il STAT 423016230 Homework f.2 Solutions .!r' ,[i.i February 17, 2008 't 1. Exercise 4.4 (LO Points). r { a. The first-order model for mean annual e'arnings, E(A) , as a function of age (r1) and hours worked' (r2) is E(A):9o*lptt0zrz. b. The least squares prediction equation is it , : 6- A : -20.35201 + 13.3504511 +243.7744612..* c. The term B6 : -20.35207 has no practical meaning because rr :0 and. r2: 0 are not in the ' observed range. For fr : 13.35b45, the mean'annual earnings is estikated to increase 6y $13.350a5 for each ad- ditional year of age, holding th'e hours worked per diy constant. For 0z :'243.7i446, we estimate that the mean annual earnings will increase by S24i.ZU+e for each additiorial hour rivorked.per day, holding the year of ag'e constant. d.'To test if age (c1) is a'statistically useful predictor of annualearnings, we will test the fo;o*i.tg ,hypotheses with. a : 0.01: ll Hs:81:g r. Ho-th*O. Flom the output, the f statistic ii 1.74,- and the two-sided pvalire is 0.7074. Because we are . carrying out.a two-sided hypothesis test, this is the pvalue for the test. Because the pvalue .is greater than a:0.01, we fail to.reject the null hypoth6sis. There iS'insufficient evidence to ,- indicate that age is a useful predictor of annual income, adjusting for hourS worked p'ef week, at Ievel a:,0.01. ;- /r ^F n6 /^. A^^ ^^,^F\e. FYom the output, the 9570 confidence interval for B2 is (105.33428, 382.09465). We are 95%" confident that for each additional hour worked per day, the mean bnnual salary will increase " ;:^- -: .anywhere from $105.33 to $382.09, holding the agd constant 2.'Exercise 4.f-0 (10 Points) ! a. The mod-el relating the mean value of the college'GPA as a linear fiinction of the three explanatory variables is q E(il : 0o'l ipt * Lzxz * lzrs. The.interpretations,of the parameters are as follows: o B6 does not have any practical meaning because vre would not observe a 0 high school GPA ' or 0 SAT score. o B1 is the cha-nge in the mean college GPA for each unit change ih high school GPA, with all. " 'other variabl6s held constant; t 't '" I ."82 isthe change in the mean it"gu GpA fo, euJn unit change in siudy hows, with Jt o,n",' variables' held constant; and r B3 is the change in the mean college GPA for each unit change in the SAT score, with all other variables heid constanl. | , - ""b. For,both models, the estimate'of the B parameter"for the SAT score is B3: b'OOf . We estimate thit the'mean'college GPA will increase by 0i001 point for each additional point scored on the"' r , SAT, with the other two variabie5'(the high s'chool GPA and hours. of siudying in'a sbason) held' constant. , ' c. Here, we are n6t given a.specific value of a to use, so we will use a common onel a : 0.05. If o is not specified inlhe prciblem, you"will state your choice before even looking at fvblues. The reported pvalue for testing Ho: Ft:0 ii 0.734.foi the black athletes'model and 0.000 for " the white athletes' model. We iriterpret ,the pvalues "as,follows: o The.p-value for the black athletes' model for testing Ile is 0.734 > 0.05. Thus, in light of,this " ;;;Jp;;",.;fail tb reject the null hypothesis. We ilo not have sufficient evidence'of a' linear relationship between high schoot GPA and college GPA, adjusting for study hours and . SAT scores for black athletes at level o:0'05' 1. ' o Since ihe pvalue ior the white athlbtes' mbdel is so sinall (0.000 < 0.05), therb is strong evidehce to reject f/s. .Ihere is sufficient efidence of a linear relaiionship b<itween high schobl .: GpA and college GPA,'adji-rsfin$ for study hours-and FAl.scores for white athletes at level : , 'a:0'05' ; 3. Exercise 4.L4 (LO Points) ' * a. Fyom the SAS outputj we see that R2: R-Jq"rr"":0.5823."-This rheans that"58.23% of.the sample variability of annual' earnings about the" mean. is ,exflained by the'linear relationship ""betwee"n annual earnings aird the indepehdent variables ag'e and.hburs woiked per day. : b...ThdSAS 6utput shows that the Rf;: Adj R-Sq : 0.1120, Thus, 51.26.% of lhe sample variabilitv df annual earnings,about dhe mean is explained by.thr5 linear itlationshib between annual'earnings , 'and the"'independent variables age and td.r., *oirca.per day, p.o.rid"d tliut we"have bdjusted for; ih" sample sire and,the nimber of-B parameters"included in the model' . , c. ,We set,o :,0.01 and conduct a test*of llobal utility of the model. The hjpotheses we test are Ho'h:02:0' " I Ho: A:least..o:re U:f o.^, iFr., i 1 . 'The output shows ttiat the F statistic is -8.36,.wliich corresponhs to'a 1>value of 0'0053. Because ;tr;;- ;" fuffr lao* o:0.01, we rejtjct"the'null hypothesis arid find'suffiiient evidence'that- ' , at least one of the indbpendent vaiiables (age or tbrris *otked pertday)'is useful for predicting """"J r;1"-". ,q.[otft"r *uy bf doirrg this problem is to compare the F statisticawith a critical F' value, which'in this case was Fo.or,z,rz:6'926608"Since F ) tr!, we reject the null hypothesis and ^reach the same concltrsio'n. { . 't ' * ld' 4. Exercise 4.2L (LO Points) / a. For the indeperident variable, rental price,'we see that B1 :2.87.'Thus, we estimate that the mean number of homeless per 100,000 population will increase by"2.87 for'each dollar increase in the rental price (10% perceirtile), wiih all other independent variables held constant' ,, b. we s<it the significance level at a : 0.05, as giveri by t[re problem. To test th'e hypothesis that the incidence of homelessness decreases as'employment growth increases, we test ' Hst 9+':b ' I ' ' H;:'ga < 0' *" 4 *! r' " , ,To test for a negative r'elaiionship, we test whether the apfropriate parameter is negative' H=ere, r we see that tfrJt-siatistic is -2.7L. With n - (k + 1)": 50 - (16 + 1)".: 33 degree,' of freedom t4 Lastly, we test if th"ri i! suflicient evidence to indicate that the relationship between support foi a'military.resolutibn and race depends on political knowledge." At lev.el a:0.05, we test the hypotlieses a ' ", Hs: Bs: g , ' H" :'Bs 10. i-. J The two-tailed pvalue is 0.08 > 0.05, which means that'we fail to reject the null hypothesis. Therefore, there is insufficient evidence to indicate that the'relationship between support ior a military resolution and race depends on political knowledge, with all other va.riables held constant. Because Rz :0.Lg'4, 19.4% ofthe va,riation in the support foi military redolution is explained'by the model containing the seven independent variables and the two interaction terms. f. The hypotheses are Ho : 0t : Fz : 0s : 0E :.0s : Fa : 0z : 0a : 0s : 0 , Ho: At least one 0t * 0,'l' : I,2,3,. . . ,9. 0.7941s : 46.88. (1 - 0.1e4)/[1763 - (e + 1)] The rejection regioil iequires o : 0.05 in the upper tail of the F distribution with v1 : lx : $ . and uz : n - (/r + 1) :1753 degrees of freedom. -The rejection region is F > Fo.os,s,rzsa = 1.88.+*' Because the observed F statistic falls into the rejection region (F:46.88 > 1.88), we reject the null hypothesis. There is su-fficient evidence to indicate that ,the model is useful at level a : 0.05. 7. Exercise 4.28 (10 Points) 1 a. If meaningful, the term Ao = 325;790 would have meLnt that the mean percentage of motor vehicles without catalytic conveiters would be 325,790% if the year is O. However, it has no practical interpretation because r : 0 is not in the observed range. We have data for the years 1984 to 1999. , b. The term h : _321.67 should not be interpreted as'a slope because'of the presence of the . quadratic term, fi2. It is just a shift parameter and has nd practical interpretation. c. The value of 0z:0.794 is positive, indicating an upward curvature in the sample data. d. Since we have no idea of the relatio.rrilip b"t*""n y and r taking plb,ce outside of the-observed range, we should not use the least squares predictioh equation to predict the value of y for r , "_.:;::'::;:r;T::H- rn this case;2027-is werr cjutside trre ra;e of (1e84' leee) a. The 95% prediction interval for gr when nr : 45 and 12: 10 is (1760,4275). With 95% confidence, we conclude that'the actual annual earnings for a sireet vendor of age 45 who works 10 hours ai, day is betweeh $1760.00 and $4275.00. b. The 95% confidence interval for E(g) when z1 : 45 and tz : 10 is (2620,3415)''with 95% coifidence, we conclude that the mean annual earnings for a 45-year-old street vendor working 10,hours a day is between'$2620.00 and $gatS'OO c. Yes, Individuals aie more variable than means, so we will have a la,rger margin of error for intervals containing aringe,of predicted values for an individual g versus the mean of g, E(y). d. The test statistic is , - R'lk'=(r-R)l[(.r,-(,b+r)] 9. Exercise 4.78 (lO Points) ' . a. To jletermine whether the modet.is adequate foi prlailtirrg ,uUo.dirrute performance g at level o :0.10, we tdst the foliowing hypotheses . Thb,test statisi,ic (formula on p. 185 of the textbook) is - F- The pvalue is ,i t'- llo't 9t :pz: gs - 0' Ho: At least one of the P" I 0 for i :7,2,3. R'lk o.2ili (t - R2)lfn -.(/' + i)] (1- 0.22)/(8e - (3 + 1)) {: P(Fe,6 > 7.99145) = 0.000093 < 0.10 : o.. 1 :7.99745. ' Thus, we reject the null hypothesis. We conclude that there is sufficient evidence to indicate the model is adequate for predicting subordinate perfoimanc6 at level o : 0.10. Alternatively, we can co'mpare inu f .tutirtic..ivith the critical value for the F distribution (where a :'0.10 is the area in the upper tail). This critical value is Fo.ro,e,es = 2.75, so the rejection region is .{F:F >f':2.15}. o * b. For tz: | (tow conflict legitimization), " 'r A, : 7.09 - 0.4411- (0.01 ' 1) +.0.06(11 ',1) * : '7.08-- 0.3811 . .. r For r: :.7 (high conflict legitimization), 9 : 7.09 - 0.44ri - (0.01 ' 7) + 0'06(11 ' 7) ' ; : 7.02 :0'02x1' Below is the graph with the lines for gr as a function of 11 when z2 :'i and nz : 7, respectivbly. The dotted line corresponds to the high confliit legitimization, and the solid,line corresponds to the low conflict legitimization. Wb see that for .r2 : 1 (lovi conflict legitimization), y declines much faster with increasing z1 . olt , r6 6 I('lJ d''l 45 I 3J 3 7-5 Plot tif Least Squares Prediction Equation for Subordinate Perforuance as a Function of Gmup Decision Method for Low and High Conflict Legitimization . u ' '. 4r1i - 702 - 0'0ht ' 1- .. dr1) .- r-,os - oJgrt 1: 0 c. To determine"if the i6lationship between sribordinate performance (d) and manager's use of a 'igroup decision method (21) ddpends or) the manager's legitimization of corrflict (r2),we test the following hyfotheses: ' o-'+ J. r il , Hot/s:Q ' , ^ Ho:0s*0. Use the-significance level of o:'0.,0, u^r rO".rU"O in the proble*. fU" t statistic is 1.85, where we have ilg - (l * 1) -85 d,egrees of freedom. Thus,*the pvalue is ! - 2'P(t8s > 1i85) :0'067788 < 0.10. ' Because the pvalue falls-below-the significan'cd level, wdrejdct'the null hj,pothesis. Thus, bt level. a : 0.10, the relationship between sudordinate p-erformance and manager's use.'of group decision method depends on a manager's legitimization of conflict. Alternativel|, th'e'critical f value is ' ,." t0'05,85 : 1'663" , ,- d. Based on the results of part c, the resed,rchers should not conduct i i;rr. on"Bi'and 82. If an interaction term is. significint, the-main'effects of each of ihe variables ma/ be covered up. According to the textb<iok (p. 195), if an interactioir'term is signifi.cant, the main effect terms should be includei in the model, regardless of the magnitude,of the pvalues..' Exercise 4.80 (10 Points) rl..' " a. We interpret the model coefficients as followsl A -^- -- t ' a . lJo: -rub: rf meani_ngful, we,would,have estimated the m'ean,daily admissions for overcast " weekdays with a predicted daily high of 0" F to be :105. However, this is not a very practical interpretation due to extrapolation a 0, :.ZSt We bstimate ih" difi"."rr"e in weekend and. weekday mean daily admissions io be .25, h'olding the weather conditions and temperature constant. . 243{ J, , 5 t: