















































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Notes; Class: THRY PROBAB &STAT I; Subject: STATISTICS; University: Iowa State University; Term: Spring 2008;
Typology: Study notes
1 / 55
This page cannot be seen from the preview
Don't miss anything!
















































Abstract This outline summarizes the main points of lectures based on Ken Koehler’s class notes and other sources.
The basic linear model structure is
Y = Xβ + ² (1)
for Y an n × 1 vector of observables, X an n × k matrix of known constants, β a k × 1 vector of (unknown) constants (parameters), and ² an n × 1 vector of unobservable random errors. Almost always one assumes that E² = 0. Often one also assumes that for an unknown constant (a parameter) σ^2 > 0 , Var² =σ^2 I (these are the Gauss-Markov model assumptions) or somewhat more generally assumes that Var² = η^2 V (these are the Aitken model assumptions). These assumptions can be phrased as “the mean vector EY is in the column space of the matrix X (EY ∈ C (X)) and the variance-covariance matrix VarY is known up to a multiplicative constant.”
The ordinary least squares estimate for EY = Xβ is made by minimizing ³ Y − Yb
Y − Yb
over choices of Yb ∈ C (X). Yb is then “the (perpendicular) projection of Y
onto C (X).” This is minimization of the squared distance between Y and Yb belonging to C (X). Computation of this projection can be accomplished using a (unique) “projection matrix” PX as
Y^ b = PXY
There are various ways of constructing PX. One is as
PX = X (X^0 X)−^ X^0
for (X^0 X)−^ any generalized inverse of X^0 X. As it turns out, PX is both symmetric and idempotent. It is sometimes called the “hat matrix” and written as H rather than PX. (It is used to compute the “y hats.”) The vector e = Y − Yb = (I − PX) Y
is the vector of residuals. As it turns out, the matrix I − PX is also a perpen- dicular projection matrix. It projects onto the subspace of Rn^ consisting of all vectors perpendicular to the elements of C (X). That is, I − PX projects onto
C (X)⊥^ ≡ {u ∈ Rn| u^0 v = 0 ∀v ∈ C (X)}
It is the case that C (X)⊥^ = C (I − PX) and
rank (X) = rank (X^0 X) = rank (PX) = dimension of C (X) = trace (PX)
and rank (I − PX) = dimension of C (X)⊥^ = trace (I − PX)
and n = rank(I) = rank (PX) + rank (I − PX)
Further, there is the Pythagorean Theorem/ANOVA identity
Y^0 Y = (PXY)^0 (PXY) + ((I − PX) Y)^0 ((I − PX) Y) = Yb^0 Yb + e^0 e
When rank(X) = k (one has a “full rank” X) every w ∈ C (X) has a unique representation as a linear combination of the columns of X. In this case there is a unique b that solves
Xb = PXY = Yb = dXβ (2)
We can call this solution of equation (2) the ordinary least squares estimate of β. Notice that we then have
XbOLS = PXY = X (X^0 X) − X^0 Y
so that X^0 XbOLS = X^0 X (X^0 X)−^ X^0 Y
These are the so called “normal equations.” X^0 X is k × k with the same rank as X (namely k) and is thus non-singular. So (X^0 X)−^ = (X^0 X)−^1 and the normal equations can be solved to give
bOLS = (X^0 X)−^1 X^0 Y
When X is not of full rank, there are multiple b’s that will solve equation (2) and multiple β’s that could be used to represent EY ∈ C (X). There is thus no sensible “least squares estimate of β.”
and talk about estimating the vector
Cβ =
c^01 β c^02 β .. . c^0 lβ
An ordinary least squares estimator of Cβ is then
C^ dβOLS = C (X^0 X)−^ X^0 Y
Related to the notion of the estimability of Cβ is the concept of “testability” of hypotheses. Roughly speaking, several hypotheses like H 0 :c^0 iβ = # are (simultaneously) testable if each c^0 iβ can be estimated and the hypotheses are not internally inconsistent. To be more precise, suppose that (as above) C is an l × k matrix of constants.
Definition 3 For a matrix C of the form (4), the hypothesis H 0 :Cβ = d is testable provided each c^0 iβ is estimable and rank(C) =l.
Cases of testing H 0 :Cβ = 0 are of particular interest. If such a hypothesis is testable, ∃ ai ∈ Rn^ such that c^0 i = a^0 iX for each i, and thus with
a^01 a^02 .. . a^0 l
one can write C = AX
Then the basic linear model says that EY ∈ C (X) while the hypothesis says
that EY ∈ C (A^0 )⊥. C (X) ∩ C (A^0 )⊥^ is a subspace of C (X) of dimension
rank (X) − rank (A^0 ) = rank (X) − l
and the hypothesis can thus be thought of in terms of specifying that the mean vector is in a subspace of C (X).
Elementary rules about how means and variances of linear combinations of ran- dom variables are computed can be applied to find means and variances for OLS estimators. Under the Gauss-Markov Model some of these are
E Yb = Xβ and Var Yb = σ^2 PX Ee = 0 and Vare = σ^2 (I − PX)
Further, for l estimable functions c^01 β, c^02 β,... , c^0 lβ and
c^01 c^02 .. . c^0 l
the estimator CdβOLS = C (X^0 X)−^ X^0 Y has mean and covariance matrix
ECdβOLS = Cβ and VarCdβOLS = σ^2 C (X^0 X) − C^0
Notice that in the case that X is full rank and β = Iβ is estimable, the above says that EbOLS = β and VarbOLS = σ^2 (X^0 X) − 1
It is possible to use Theorem 5.2A of Rencher about the mean of a quadratic form to also argue that
Ee^0 e =E
Y − Yb
Y − Yb
= σ^2 (n − rank (X))
This fact suggests the ratio
e^0 e n − rank (X)
as an obvious estimate of σ^2. In the Gauss-Markov model, ordinary least squares estimation has some optimality properties. Foremost there is the guarantee provided by the Gauss- Markov Theorem. This says that under the linear model assumptions with Var² =σ^2 I, for estimable c^0 β the ordinary least squares estimator dc^0 βOLS is the Best (in the sense of minimizing variance) Linear (in the entries of Y) Unbiased (having mean c^0 β for all β) Estimator of c^0 β.
For V positive definite, suppose that Var² = η^2 V. There exists a symmetric positive definite square root matrix for V−^1 , call it V−^ (^12)
. Then
U = V−^
(^12) Y
satisfies the Gauss-Markov model assumptions with model matrix
W = V−^
(^12) X
It then makes sense to do ordinary least squares estimation of EU ∈ C (W)
(with PWU = Ub = Wdβ). Note that for c ∈ C (W^0 ) the BLUE of the parametric function c^0 β is
c^0 (W^0 W) − W^0 U = c^0 (W^0 W) − W^0 V−^
(^12) Y
What then is there to choose between two fundamentally equivalent linear models? There are two issues. Computational/formula simplicity pushes one in the direction of using full rank versions. Sometimes, scientific interpretability of parameters pushes one in the opposite direction. It must be understood that the set of inferences one can make can ONLY depend on the column space of the model matrix, NOT on how that column space is represented.
If one adds to the basic Gauss-Markov (or Aitken) linear model assumptions an assumption that ² (and therefore Y) is multivariate normal, inference formulas (for making confidence intervals, tests and predictions) follow. These are based primarily on two basic results.
Theorem 4 (Koehler 4.7, panel 309. See also Rencher Theorem 5.5.A.) Sup- pose that A is n × n and symmetric with rank(A) = k, Y ∼MVNn (μ, Σ) for Σ positive definite. If AΣ is idempotent, then
Y^0 AY ∼ χ^2 k (μ^0 Aμ)
(So, if in addition Aμ = 0 , then Y^0 AY ∼ χ^2 k.)
Theorem 5 (Theorem 1.3.7 of Christensen) Suppose that Y ∼MVN
μ, σ^2 I
and BA = 0. a) If A is symmetric, Y^0 AY and BY are independent, and b) if both A and B are symmetric, then Y^0 AY and Y^0 BY are independent.
(Part b) is Koehler’s 4.8. Part a) is a weaker form of Corollary 1 to Rencher’s Theorem 5.6.A and part b) is a weaker form of Corollary 1 to Rencher’s Theorem 5.6.B.) Here are some implications of these theorems.
Example 6 In the normal Gauss-Markov model
1 σ^2
Y − Yb
Y − Yb
σ^2
∼ χ^2 n−rank(X)
This leads, for example, to 1 − α level confidence limits for σ^2
à SSE upper α 2 point of χ^2 n−rank(X)
lower α 2 point of χ^2 n−rank(X)
Example 7 (Estimation and testing for an estimable function) In the normal Gauss-Markov model, if c^0 β is estimable,
dc^0 βO L S − c^0 β √ MSE
q c^0 (X^0 X)−^ c
∼ tn−rank(X)
This implies that H 0 :c^0 β = # can be tested using the statistic
dc (^0) βO LS − # √ MSE
q c^0 (X^0 X)−^ c
and a tn−rank(X) reference distribution. Further, if t is the upper α 2 point of the tn−rank(X) distribution, 1 − α level two-sided confidence limits for c^0 β are
dc (^0) βO LS ± t
q c^0 (X^0 X)−^ c
Example 8 (Prediction) In the normal Gauss-Markov model, suppose that c^0 β is estimable and y∗^ ∼N
c^0 β, γσ^2
independent of Y is to be observed. (We assume that γ is known.) Then
c^ d^0 βO LS − y∗ √ MSE
q γ + c^0 (X^0 X)−^ c
∼ tn−rank(X)
This means that if t is the upper α 2 point of the tn−rank(X) distribution, 1 − α level two-sided prediction limits for y∗are
dc (^0) βO LS ± t
q γ + c^0 (X^0 X)−^ c
Example 9 (Testing) In the normal Gauss-Markov model, suppose that the hypothesis H 0 :Cβ = d is testable. Then with
SSH 0 =
dCβO LS − d
dCβO LS − d
it’s easy to see that SSH (^0) σ^2
∼ χ^2 l
δ^2
for
δ^2 =
σ^2 (Cβ − d)^0
− C^0
(Cβ − d)
This in turn implies that
SSH 0 /l MSE
∼ Fl,n−rank(X)
δ^2
So with f the upper α point of the Fl,n−rank(X) distribution, an α level test of
H 0 :Cβ = d can be made by rejecting if SSH 0 /l MSE > f^.^ The power of this test is power
δ^2
an Fl,n−rank(X)
δ^2
random variable > f
Or taking a significance testing point of view, a p-value for testing this hypothesis is
P
an Fl,n−rank(X) random variable exceeds the observed value of SSH 0 /l MSE
Unless n is small or one is very unlucky, in regression contexts X is of full rank (i.e. is of rank r + 1). A few specifics of what has gone before that are of particular interest in the regression context are as follows. As always Yb = PXY and in the regression context it is especially common to call PX = H the hat matrix. It is n × n and its diagonal entries hii are sometimes used as indices of “influence” or “leverage” of a particular case on the regression fit. It is the case that each hii ≥ 0 and X hii = trace (H) = rank (H) = r + 1
so that the “hats” hii average to r+1 n. In light of this, a case with hii > 2(r n+1) is sometimes flagged as an “influential” case. Further
Var Yb = σ^2 PX = σ^2 H
so that Varbyi = hiiσ^2 and an estimated standard deviation of ybi is
hii
(This is useful, for example, in making confidence intervals for Eyi.) Also as always, e = (I − PX) Y and Vare = σ^2 (I − PX) = σ^2 (I − H). So Varei = (1 − hii) σ^2 and it is typical to compute and plot standardized versions of the residuals e∗ i =
ei √ MSE
1 − hii The general testing of hypothesis framework discussed in Section 1.6 has a particular important specialization in regression contexts. That is, it is common in regression contexts (for p < r) to test
H 0 : βp+1 = βp+2 = · · · = βr = 0 (6)
and in first methods courses this is done using the “full model/reduced model” paradigm. With Xi = ( 1 |x 1 |x 2 | · · · |xi)
this is the hypothesis H 0 : EY ∈ C (Xp)
It is also possible to write this hypothesis in the standard form H 0 :Cβ = 0 using the matrix
(r−p)×(r+1)
μ 0 (r−p)×(p+1)
(r−p)×(r−p)
So from Section 1.6 the hypothesis can be tested using an F test with numerator sum of squares
SSH 0 = (CbOLS )^0
− 1 C^0
(CbOLS )
What is interesting and perhaps not initially obvious is that
SSH 0 = Y^0
PX − PXp
and that this kind of sum of squares is the elementary SSRfull − SSRreduced. (A proof of the equivalence (7) is on a handout posted on the course web page.) Further, the sum of squares in display (7) can be made part of any number of interesting partitions of the (uncorrected) overall sum of squares Y^0 Y. For example, it is clear that
Y^0 Y = Y^0
PXp − P 1
PX −^ PXp
so that
Y^0 Y − Y^0 P 1 Y = Y^0
PXp − P 1
PX − PXp
In elementary regression analysis notation
Y^0 Y − Y^0 P 1 Y = SST ot (corrected) Y^0
PXp − P 1
Y = SSRreduced Y^0
PX − PXp
Y = SSRfull − SSRreduced Y^0 (I − PX) Y = SSEfull
(and then of course Y^0 (PX − P 1 ) Y = SSRfull ). These four sums of squares are often arranged in an ANOVA table for testing the hypothesis (6). It is common in regression analysis to use “reduction in sums of squares” notation and write
R(β 0 ) = Y^0 P 1 Y R(β 1 ,... , βp|β 0 ) = Y^0
PXp − P 1
R(βp+1,... , βr |β 0 , β 1 ,... , βp) = Y^0
PX − PXp
so that in this notation, identity (8) becomes
Y^0 Y = R(β 0 ) + R(β 1 ,... , βp|β 0 ) + R(βp+1,... , βr|β 0 , β 1 ,... , βp) + SSE
And in fact, even more elaborate breakdowns of the overall sum of squares are possible. For example,
R(β 0 ) = Y^0 P 1 Y R(β 1 |β 0 ) = Y^0 (PX 1 − P 1 ) Y R(β 2 |β 0 , β 1 ) = Y^0 (PX 2 − PX 1 ) Y .. . R(βr|β 0 , β 1 ,... , βr− 1 ) = Y^0
PX − PXr− 1
represents a “Type I” or “Sequential” sum of squares breakdown of Y^0 Y−SSE. (Note that these sums of squares are appropriate numerator sums of squares for testing significance of individual β’s in models that include terms only up to the one in question.) The enterprise of trying to assign a sum of squares to a predictor variable strikes Vardeman as of little real interest, but is nevertheless a common one. Rather than think of
R(βi|β 0 , β 1 ,... , βi− 1 ) = Y^0
PXi − PXi− 1
Each of these is a linear combination of the I ×J means μij and is thus estimable. So are the linear combinations of them
αi = μi. − μ.., βj = μj. − μ.., and αβij = μij −
μ.. + αi + βj
The “factorial effects” (10) here are particular (estimable) linear combinations of the cell means. It is a consequence of how these are defined that X
i
αi = 0,
j
βj = 0,
i
αβij = 0 ∀j, and
j
αβij = 0 ∀i (11)
An issue of particular interest in two way factorials is whether the hypothesis
H 0 :αβij = 0 ∀i and j (12)
is tenable. (If it is, great simplification of interpretation is possible ... changing levels of one factor has the same impact on mean response regardless of which level of the second factor is considered.) This hypothesis can be equivalently written as μij = μ.. + αi + βj ∀i and j
or as (^) ¡ μij − μij 0
μi (^0) j − μi (^0) j 0
= 0 ∀i, i^0 , j and j^0
and is a statement of “parallelism” on “interaction plots” of means. To test this, one could write the hypothesis in terms of (I − 1)(J − 1) statements
μij − μi. − μ.j + μ.. = 0
about the cell means and use the machinery for testing H 0 :Cβ = d from Ex- ample 9. In this case, d = 0 and the test is about EY falling in some subspace of C (X). For thinking about the nature of this subspace and issues related to the hypothesis (12), it is probably best to back up and consider an alternative to the cell means model approach. Rather than begin with the cell means model, one might instead begin with the non-full-rank “effects model”
yijk = μ∗^ + α∗ i + β∗ j + αβ∗ ij + ijk (13)
I have put stars on the parameters to make clear that this is something different from beginning with cell means and defining effects as linear combinations of them. Here there are k = 1 + I + J + IJ parameters for the means and only IJ different means. A model including all of these parameters can not be of full rank. To get simple computations/formulas, one must impose some restrictions. There are several possibilities. In the first place, the facts (11) suggest the so called “sum restrictions” in the effects model (13) X
i
α∗ i = 0,
j
β∗ j = 0,
i
αβ∗ ij = 0 ∀j, and
j
αβ∗ ij = 0 ∀i
Alternative restrictions are so-called “baseline restrictions.” SAS uses the base- line restrictions
α∗ I = 0, β∗ J = 0, αβ∗ Ij = 0 ∀j, and αβ∗ iJ = 0 ∀i
while R and Splus use the baseline restrictions
α∗ 1 = 0, β∗ 1 = 0, αβ∗ 1 j = 0 ∀j, and αβ∗ i 1 = 0 ∀i
Under any of these sets of restrictions one may write a full rank model matrix as
X n×IJ
n× 1 | Xα∗ n×(I−1)
| Xβ∗ n×(J−1)
| Xαβ∗ n×(I−1)(J−1)
and the no interaction hypothesis (12) is the hypothesis H 0 :EY ∈ C (( 1 |Xα∗ |Xβ∗^ )). So using the full model/reduced model paradigm from the regression discussion, one then has an appropriate numerator sum of squares
SSH 0 = Y^0
PX − P( 1 |Xα∗ |Xβ∗ (^) )
and numerator degrees of freedom (I − 1) (J − 1) (in complete factorials where every nij > 0 ). Other hypotheses sometimes of interest are
H 0 :αi = 0 ∀i or H 0 :βj = 0 ∀j (14)
These are the hypotheses that all row averages of cell means are the same and that all column averages of cell means are the same. That is, these hypotheses could be written as
H 0 :μi. − μi (^0). = 0 ∀i, i^0 or H 0 :μ.j − μ.j 0 = 0 ∀j, j^0
It is possible to write the first of these in the cell means model as H 0 :Cβ = 0 for C that is (I − 1) × k and each row of C specifying αi = 0 for one of i = 1, 2 ,... , (I − 1) (or equality of two row average means). Similarly, the second can be written in the cell means model as H 0 :Cβ = 0 for C that is (J − 1) × k and each row of C specifying βj = 0 for one of j = 1, 2 ,... , (J − 1) (or equality of two column average means). Appropriate numerator sums of squares and degrees of freedom for testing these hypotheses are then obvious using the material of Example 9. These sums of squares are often referred to as “Type III” sums of squares. How to interpret standard partitions of sums of squares and to relate them to tests of hypotheses (12) and (14) is problematic unless all “cell” sample sizes are the same (all nij = m, the data are “balanced”). That is, depending upon what kind of partition one asks for in a call of a standard two-way ANOVA routine, the program produces the following breakdowns
That is, let X be the cell means model matrix (for k “full” cells) and
n×k
n× 1 | X∗ α∗ n×(I−1)
| X∗ β∗ n×(J−1)
be an appropriate restricted version of an effects model model matrix (with no interaction terms). If the pattern of empty cells is such that X∗^ is full rank (has rank I + J − 1 ), the hypothesis (15) can be tested using
Y^0 (PX − PX∗ ) Y/ (k − (I + J − 1)) Y^0 (I − PX) Y/ (n − k)
and an F(k−(I+J−1)),(n−k) reference distribution. Further, every
μ∗^ + α∗ i + β∗ j
is estimable in the no interaction effects model. Provided this model extends to all I × J combinations of levels of A and B, this provides estimates of mean responses for all cells. (Note that this is essentially the same kind of extrapo- lation one does in a regression context to sets of predictors not in the original data set. However, on an intuitive basis, the link supporting extrapolation is probably stronger with quantitative regressors than it is with the qualitative predictors of the present context.)
2 Nonlinear Models
A generalization of the linear model is the (potentially) “nonlinear” model that for β a k × 1 vector of (unknown) constants (parameters) and for some function
f (x, β)
that is smooth (differentiable) in the elements of β, says that what is observed can be represented as yi = f (xi, β) + i (16)
for each xi a known vector of constants. (The dimension of x is fixed but basically irrelevant for what follows. In particular, it need not be k.) As is typical in the linear model, one usually assumes that Ei = 0 ∀i, and it is also common to assume that for an unknown constant (a parameter) σ^2 > 0 , Var² =σ^2 I.
In general (unlike the case when f (xi, β) = x^0 iβ and the model (16) is a linear model) there are typically no explicit formulas for least squares estimation of β. That is, minimization of
g (b) =
X^ n
i=
(yi − f (xi, b))^2 (17)
is a problem in numerical analysis. There are a variety of standard algorithms used for this purpose. They are all based on the fact that a necessary condition for bOLS to be a minimizer of g (b) is that
∂g ∂bj
b=bO L S
= 0 ∀j
so that in search for an ordinary least squares estimator, one might try to find a simultaneous solution to these k “estimating” equations. A bit of calculus and algebra shows that bOLS must then solve the matrix equation
0 = D^0 (Y − f (X, b)) (18)
where we use the notations
n×k
μ ∂f (xi, b) ∂bj
and f (X, b) n× 1
f (x 1 , b) f (x 2 , b) .. . f (xn, b)
In the case of the linear model
μ ∂ ∂bj
x^0 ib
= (xij ) = X and f (X, b) = Xb
so that equation (18) is 0 = X^0 (Y − XB), i.e. is the set of normal equations X^0 Y = X^0 Xb. One of many iterative algorithms for searching for a solution to the equation (18) is the Gauss-Newton algorithm. It proceeds as follows. For
br^ =
br 1 br 2 .. . brk
the approximate solution produced by the rth iteration of the algorithm (b^0 is some vector of starting values that must be supplied by the user), let
Dr^ =
μ ∂f (xi, b) ∂bj
b=br
The first order Taylor (linear) approximation to f (X, β) at br^ is
f (X, β) ≈ f (X, br^ ) + Dr^ (β − br)
So the nonlinear model Y = f (X, β) + ² can be written as
Y ≈ f (X, br^ ) + Dr^ (β − br) + ²
(D^0 D)−^1 “typically gets small” with increasing sample size.
MSE = SSEn−k ≈ σ^2.
(D^0 D)−^1 ≈
Db^0 Db
where Db=
μ ∂f (xi,b) ∂bj
b=bO LS
h (β) , σ^2 G (D^0 D)−^1 G^0
for G q×k
μ ∂hi(b) ∂bj
b=β
μ ∂hi(b) ∂bj
b=bO LS
Using this set of approximations, essentially exactly as in Section 1.6, one can develop inference methods. Some of these are outlined below.
Example 11 (Inference for a single βj ) From part 1) of Claim 10 we get the approximation bO LSj − βj σ
ηj
. ∼ N (0, 1)
for ηj the jth diagonal entry of (D^0 D)−^1. But then from parts 2) and 3) of the claim, bO L Sj − βj σ
ηj
bO LSj − βj √ MSE
p bηj
for bηj the jth diagonal entry of
Db^0 Db
. In the (normal Gauss-Markov)
linear model context, this last random variable is in fact t distributed for any n. Then both so that the nonlinear model formulas reduce to the linear model formulas, and as a means of making the already very approximate inference formulas somewhat more conservative, it is standard to say
bO LSj − βj √ MSE
p bηj
. ∼ tn−k
and thus to test H 0 :βj = # using
bO L Sj − # √ MSE
p bηj
and a tn−k reference distribution, and to use the values
bO L Sj ± t
q bηj
as confidence limits for βj.
Example 12 (Inference for a univariate function of β, including a single mean response) For h that maps <k^ → <^1 consider inference for h (β) (with appli- cation to f (x, β) for a given set of predictor variables x). Facts 4) and 5) of Claim 10 suggest that
h (bO LS ) − h (β) √ MSE
r Gb
Db^0 Db
Gb^0
. ∼ tn−k
This leads (as in the previous application/example) to testing H 0 :h (β) = # using
T =
h (bO L S ) − # √ MSE
r Gb
Db^0 Db
Gb^0
and a tn−k reference distribution, and to use of the values
h (bO LS ) ± t
r Gb
Db^0 Db
Gb^0
as confidence limits for h (β). For a set of predictor variables x, this can then be applied to h (β) = f (x, β) to produce inferences for the mean response at x. That is, with
1 ×k
∂f (x, b) ∂bj
b=β
and, as expected, Gb =
∂f (x, b) ∂bj
b=bO LS
one may test H 0 :f (x, β) = # using
f (x, bO L S ) − # √ MSE
r Gb
Db^0 Db
Gb^0
and a tn−k reference distribution, and use the values
f (x, bO L S ) ± t
r Gb
Db^0 Db
Gb^0
as confidence limits for f (x, β).
Example 13 (Prediction) Suppose that in the future, y∗^ normal with mean h (β) and variance γσ^2 independent of Y will be observed. (The constant γ is assumed to be known.) Approximate prediction limits for y∗^ are then
h (bO LS ) ± t
r γ + Gb
Db^0 Db
Gb^0