Download Regression Analysis: Statistics for Economics 204 at Davidson College and more Exams Statistics in PDF only on Docsity! Name: Statistics Davidson College Economics 204, Fall 2002 Mark C. Foley Review # 4 Suggested Solutions Directions: This review is closed-book, closed-notes (except for your formula sheet and statistical tables) to be taken in one sitting not to exceed 4 hours. You may use a calculator. Perform your calculations to 3 decimal places. There are 100 points on the exam. Problem 1 is worth 20 points, Problem 2 is worth 30 points, and Problem 3 is worth 50 points. You must show all your work to receive full credit. Any assumptions you make and intermediate steps should be clearly indicated. Do not simply write down a final answer to the problems without an explanation. Please turn in your formula sheet and set of statistical tables with your exam. Gather ye rosebuds while ye may. Honor Pledge Start time End time Problem 1 Let the sample regression line be nieyebxay iiiii ,...,2,1,ˆ . Let x and y denote sample means for the independent and dependent variables, respectively. (a) Show that )( xxbyye iii . iiiiiiiii bxxbyyebxayeebxay )( )()()()( xxbyyebxxbyye iiiiii ■ (b) Show that 0 1 n i ie . n i ii n i i xxbyye 11 )( using part (a)’s result n i n i i n i n i i xbxbyy 1111 xbnxbnynyn 0 ■ (c) Show that n i i n i i n i i xxbyye 1 22 1 2 1 2 )()( . Again, using part (a), 2 11 2 )()( n i ii n i i xxbyye n i iiii xxbxxyybyy 1 222 )())((2)( n i i n i n i iii xxbxxyybyy 1 22 1 1 2 )())((2)( n i i n i n i iiii xxbxxxxbebyy 1 22 1 1 2 )())](([2)( substituting for )( yyi focus on the middle term for now … 0: 0: ,1 ,0 hw hw H H First, note that for simple regression (i.e., with only 1 explanatory variable), r , the sample correlation coefficient, equals the square root of R2. Test statistic 156.7 16 )8729.1( 8729. 2 )1( 22 n r r t Comparing the t-statistic to the critical value 583.201,.1601,.2 ttn , we reject H0 at the 1% level since 7.156 > 2.583. we also reject H0 at the .5% level since 7.156 > 2.921= 005,.16t . (c) Interpret R2. R2 = .762. This indicates that 76.2% of the variability in y is explained by the regression equation. (d) Calculate and interpret a 95% confidence interval for the slope parameter . )781(.120.2586.5)781(.586.5ˆ 025,.16ˆ2/,2 tstn So the CI is (3.93, 7.242) If repeated samples of size 18 are taken, and a 95% confidence interval is calculated for each sample, we expect that 95% of these CI’s will contain the true value, . (e) Calculate and interpret a 90% confidence interval for the average weight of people who are 72 inches tall. 189.163)72(586.5003.239ˆ586.5003.239ˆ 111 nnn yxy For the forecast of ]ˆ[ 11 nn xyE , the confidence interval is 2 2 2 1 2/,21 )( )(1 ˆ e i n nn sxx xx n ty )336.11( )( )72( 18 1 189.163 2 2 2 05,.16 xx x t i Now I meant to give you x and 2)( xxi , but I didn’t so the enterprising student who wrote down all formulas from the book will have figured them out from the given information. To find 2)( xxi , we use two equations with two unknowns. Part (c) in Problem 1 and p. 389 indicate that n i i n i i n i i xxbyye 1 22 1 2 1 2 )()( and p. 390 has 2 2 22 )( )( yy xx bR i i . These become n i i n i i xxyy 1 22 1 2 )(586.5)(075.2056 and 2 2 2 )( )( 586.5762. yy xx i i Solving these yields 036.211)(806.8641)( 1 2 1 2 n i i n i i xxandyy To find x , we note that p. 392 indicates 22 2 2 )( 1 e i a sxx x n s . This becomes 505.128 036.211 18964.53 2 2 x , so 07.69x Back to the problem )336.11( 036.211 )07.6972( 18 1 746.1189.163 2 2 (157.049, 169.329) If we take repeated samples of size 18 AND regress Weight on Height for each sample AND construct a 90% CI (of which this is just one CI) for each sample, then we expect 90% of these CI’s to contain the AVERAGE weight for 72-inch-tall people. Problem 3 Assume the administration is interested in the relationship between the distance a student lives from Chambers and the number of absences per semester. The following data is collected: Student Number of Absences (y) Distance from Chambers (100’s of feet) (x) 1 3 3 2 3 10 3 1 2 4 7 15 5 2 5 (a) Estimate the sample regression line for the regression of the number of absences on the distance a student lives from Chambers. y x x2 y2 xy y-hat e=y-yhat e2 3 3 9 9 9 1.708 1.292 1.6693 3 10 100 9 30 4.319 -1.319 1.7398 1 2 4 1 2 1.335 -.335 .1122 7 15 225 49 105 6.184 .816 .6659 2 5 25 4 10 2.454 -.454 .2061 SUM 16 35 363 72 156 0 4.3933 Average 16/5=3.2 35/5=7 373. 118 44 )7*7(5363 )2.3)(7(5156 )( ))((ˆ 222 xnx yxnyx xx yyxx i ii i ii 589.)7)(373(.2.3ˆˆ xy ii xy 373.589.ˆ or iii exy 373.589. (b) Interpret the intercept and slope coefficients of the regression you estimated in part (a). 373.b As distance from Chambers increases by 1 unit (= 100 feet), the average number of absences increases by .373 per semester. 589.a Since the data does not include values of x near 0 (200 feet is as close as the data gets), the intercept is simply an extrapolation of the sample regression line. Also, students cannot live in Chambers (0 feet) – full-time, that is. (c) Clearly evaluate the regression model’s goodness of fit according to two different criteria.