Least Squares Regression - Lecture Notes | STAT 201, Study notes of Statistics

lecture 7 Material Type: Notes; Class: ELEM STAT INFERENCE; Subject: STATISTICS; University: Texas A&M University; Term: Unknown 1989;

Typology: Study notes

Pre 2010

Uploaded on 02/13/2009

koofers-user-249
koofers-user-249 🇺🇸

9 documents

1 / 8

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Least-Squares Regression
Recall that correlation measures the direction and strength of the linear
(straight line) relationship between two variables. If a scatterplot shows a
linear relationship, we would like to draw a line on the plot to summarize
the relationship. This line is called a regression line.
1
pf3
pf4
pf5
pf8

Partial preview of the text

Download Least Squares Regression - Lecture Notes | STAT 201 and more Study notes Statistics in PDF only on Docsity!

Least-Squares Regression

Recall that correlation measures the direction and strength of the linear (straight line) relationship between two variables. If a scatterplot shows a linear relationship, we would like to draw a line on the plot to summarize the relationship. This line is called a regression line.

Fitting a line to the data Fitting a line to the data means drawing a line that comes as close as possible to the points. The equation of a line fitted to this data gives a compact description of the dependence of the response variable y on the explana- tory variable x. The following is a review of basic features of a straight line.

Example 2.3.1 Consider the straight line: y = −6 + 3x. Answer the following questions: (a) What are the slope and intercept of the line?

(b) In the following table we give several x-values. Complete the table by computing the corresponding y values. x y − 1 0 2

  1. 1

(c) Plot the line using the table in (b).

We represent n observations on two variables x and y as

(x 1 , y 1 ), (x 2 , y 2 ),... , (xn, yn)

Our goal is to approximate the response values y 1 by a+bx 1 , y 2 by a+bx 2 and so on, for some a and b. How to find such a and b? If we approximate y by a+bx, we make an error of the amount y −a−bx. We find a and b such that they minimize the sum of square of the errors, i.e.,

(y 1 − a − bx 1 )^2 + (y 2 − a − bx 2 )^2 +... + (yn − a − bxn)^2.

  • The expression for slope, b, says that along the regression line, a change of one standard deviation in x corresponds to a change of r standard deviations in y.
  • The slope, b, is the amount by which y changes when x increases by one unit.
  • The intercept, a, is the value of y when x = 0.
  • The least-squares regression line ALWAYS passes through the point (¯x, y¯).

Example 2.3.3 The Kalama growth data. How do children grow? The pattern of growth varies from child to child, so we can best understand the general pattern by following the average height of a number of children. Researchers monitored the growth of a group of 161 children from the village of Kalama in Egypt. The response variable is the mean height of the group and the explanatory variable is the age of the children in months.

The least-squares regression line for this data is: y = 64.93 + 0. 635 x.

(a) Predict the mean height of Kalama children at 19.5 months of age?

(b) Predict the mean height of Kalama children at 32 months of age?

(c) Is there any reason that would make you doubt the accuracy of the prediction in part (b)? If so, why?

Correlation and regression There is a close connection between correlation and regression.

  • Use r^2 as a measure of how successfully the regression explains the response.
  • Interpret r^2 as the percent of variance explained

Example 2.3.3 continued. In the Kalama growth data example, r^2 = 0.988. We can see that r^2 is very close to 1 which means that the regression line describes the data almost perfectly!

In other words, the explanatory variable x (age in months) can “explain” 98 .8% of the variability of the response variable y (mean height).

Example 2.3.4 A study of class attendance and grades among first year students at a state university showed that in general students who attend a higher percent of their classes earned higher grades. Class attendance explained 16% of the variation in grade index among the students. What is the numerical value of the correlation between percent of classes attended and grade index?