




Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
lecture 7 Material Type: Notes; Class: ELEM STAT INFERENCE; Subject: STATISTICS; University: Texas A&M University; Term: Unknown 1989;
Typology: Study notes
1 / 8
This page cannot be seen from the preview
Don't miss anything!





Recall that correlation measures the direction and strength of the linear (straight line) relationship between two variables. If a scatterplot shows a linear relationship, we would like to draw a line on the plot to summarize the relationship. This line is called a regression line.
Fitting a line to the data Fitting a line to the data means drawing a line that comes as close as possible to the points. The equation of a line fitted to this data gives a compact description of the dependence of the response variable y on the explana- tory variable x. The following is a review of basic features of a straight line.
Example 2.3.1 Consider the straight line: y = −6 + 3x. Answer the following questions: (a) What are the slope and intercept of the line?
(b) In the following table we give several x-values. Complete the table by computing the corresponding y values. x y − 1 0 2
(c) Plot the line using the table in (b).
We represent n observations on two variables x and y as
(x 1 , y 1 ), (x 2 , y 2 ),... , (xn, yn)
Our goal is to approximate the response values y 1 by a+bx 1 , y 2 by a+bx 2 and so on, for some a and b. How to find such a and b? If we approximate y by a+bx, we make an error of the amount y −a−bx. We find a and b such that they minimize the sum of square of the errors, i.e.,
(y 1 − a − bx 1 )^2 + (y 2 − a − bx 2 )^2 +... + (yn − a − bxn)^2.
Example 2.3.3 The Kalama growth data. How do children grow? The pattern of growth varies from child to child, so we can best understand the general pattern by following the average height of a number of children. Researchers monitored the growth of a group of 161 children from the village of Kalama in Egypt. The response variable is the mean height of the group and the explanatory variable is the age of the children in months.
The least-squares regression line for this data is: y = 64.93 + 0. 635 x.
(a) Predict the mean height of Kalama children at 19.5 months of age?
(b) Predict the mean height of Kalama children at 32 months of age?
(c) Is there any reason that would make you doubt the accuracy of the prediction in part (b)? If so, why?
Correlation and regression There is a close connection between correlation and regression.
Example 2.3.3 continued. In the Kalama growth data example, r^2 = 0.988. We can see that r^2 is very close to 1 which means that the regression line describes the data almost perfectly!
In other words, the explanatory variable x (age in months) can “explain” 98 .8% of the variability of the response variable y (mean height).
Example 2.3.4 A study of class attendance and grades among first year students at a state university showed that in general students who attend a higher percent of their classes earned higher grades. Class attendance explained 16% of the variation in grade index among the students. What is the numerical value of the correlation between percent of classes attended and grade index?