









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Notes; Class: STATISTICAL INFERENCE; Subject: Statistics; University: University of Pennsylvania; Term: Unknown 1989;
Typology: Study notes
1 / 16
This page cannot be seen from the preview
Don't miss anything!










defects. A Teflon tube (catheter) is passed into a major vein or artery and pushed into the heart to obtain information about its functional ability.
catheter has reached.
reach the heart (the length of the catheter, Y ) based on something easily observed about the child, such as height ( X ).
required was determined by using fluoroscopy to check that the tip of the catheter had reached the pulmonary artery.
( X 1 , Y 1 ),... , ( X 12 , Y 12 ).
height. Making inferences about one variable Y based on observations of a second variable X is called regression analysis.
X = x. Catheter length is not exactly the same for all children of height x : it has a population distrn.
one for each x.
each level of the grouping factor X.
then knowing X would tell us nothing about Y. But, you’ll see it a lot.
observe X = x .”
is the mean of the catheter length population distrn associated with the height 48 inches: E [ Y | X = 48]. This is called the conditional mean of Y given X = 48.
X. (Devore calls this μ Y · x – that’s bizarre notation.) We need to make assumptions about the regression function.
estimate, to a problem with only two parameters to estimate.
μ( x ) = E [ Y | X = x ] = β 0 + β 1 x.
This is a strong assumption! When we make it, we are doing (simple) linear regression , and μ( x ) is called the regression line.
means that are close together. The specific realization of that assumption is a straight-line relationship between x and μ( x ).
height leads to a 0.6-centimeter increase in the expected catheter length.
Bivariate Fit of Catheter length required By Height
Linear Fit Catheter length = 12.124045 + 0.5967612 Height
7
corresponding residuals, i.e. their sum of squares:
S ( βˆ 0 , βˆ 1 ) =
∑^ n
i = 1
e i^2 =
∑^ n
i = 1
( yi − ( βˆ 0 + ˆβ 1 xi ))^2
least-squares estimates of the linear regression coefficients.
solution always exists and is unique. In fact, the solution is closed form:
β^ ˆ 1 LS =
i ( ∑ xi^ − ¯ x )( yi^ − ¯ y ) i ( xi^ − ¯ x )^2
Cov( X , Y ) Var̂ ( X ) β^ ˆ 0 LS = ¯ y − ˆβ 1 x ¯
(From now on, we drop the “LS” superscript.)
( βˆ 0 , βˆ 1 ). Then prove S is a strictly convex function of ( βˆ 0 , βˆ 1 ) – this tells you the estimates are the unique global minimizers of S.
estimates of the regression coefficients. We just said that minimizing the residual sum of squares seems reasonable.
estimates are, in a certain sense, optimal.
( βˆ 0 , βˆ 1 ), and thus make statistical inferences about them.
possible x , ( Y | X = x ) = β 0 + β 1 x + , where ∼ N ( 0 , σ 2 ). We now assume n independent draws under this model, each using one of the n values x 1 ,... , xn. This gives the following model for the data: ( Yi | Xi = xi ) = β 0 + β 1 xi + i , with the i ’s IID N ( 0 , σ 2 ).
σ 2. It is based on the sum of squared residuals, which we denote SSR:
σ ˆ 2 =
n − 2
∑ n i = 1 e
2 i n − 2
∑ n i = 1 ( yi^ − ˆ yi^ )
2 n − 2
Notice that there are n − 2 degrees of freedom left over to estimate the variance, because we used two df for ( βˆ 0 , βˆ 1 ).
σˆ 2.
than any other linear unbiased estimates of (β 0 , β 1 ) – linear here means estimates which are linear functions of y 1 ,... , yn. They are BLUE: the “best linear unbiased estimates.”
mean response μ(ˆ x ).
the range of observed x ’s.
predict mean catheter length required for a child who is 42.0 inches tall, a height not observed in the sample.)
the range of observed x ’s. This is seldom reasonable. A straight-line model may hold approximately over the region of observed x ’s, but not for all X.
(Ouch.)