Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Non Linear Regression - Data Analysis II - Notes | STAT 8320, Study notes of Statistics

University of Missouri (MU) - Columbia Statistics

Material Type: Notes; Class: Data Analysis II; Subject: Statistics; University: University of Missouri - Columbia; Term: Unknown 1989;

Typology: Study notes

Pre 2010

Uploaded on 08/18/2009

koofers-user-z7o 🇺🇸

10 documents

1 / 24

This page cannot be seen from the preview

Don't miss anything!

Non-Linear Regression

•You have seen models that are linear in the param-

eters. For example,

–A model which is linear in the regressors and

linear in the parameters,

Yi=β0+β1X1i+β2X2i+²i.

–A model which is linear in the parameters and

nonlinear in the regressors,

Yi=β0+β1X1iX2i+β2X2

1i+²i.

•What happens if the model is non-linear in the pa-

rameters? For example

Yi=αXβ1

1iXβ2

2ie²i.

•We could consider re-parameterizing the model by

log(Yi) = log(α) + β1log(X1i) + β2log(X2i) + ²i.

Note that this model is equivalent to the nonlinear

model above.

•We could approximate a nonlinear model by a linear

model on a small interval.

F.1

Discover Study notes of Statistics University of Missouri (MU) - Columbia

Partial preview of the text

Download Non Linear Regression - Data Analysis II - Notes | STAT 8320 and more Study notes Statistics in PDF only on Docsity!

Non-Linear Regression

You have seen models that are linear in the param- eters. For example, - A model which is linear in the regressors and linear in the parameters,

Yi = β 0 + β 1 X 1 i + β 2 X 2 i + ≤i.

A model which is linear in the parameters and nonlinear in the regressors,

Yi = β 0 + β 1 X 1 iX 2 i + β 2 X 12 i + ≤i.

What happens if the model is non-linear in the pa- rameters? For example

Yi = αX 1 βi^1 X 2 βi^2 e≤i^.

We could consider re-parameterizing the model by

log(Yi) = log(α) + β 1 log(X 1 i) + β 2 log(X 2 i) + ≤i. Note that this model is equivalent to the nonlinear model above.

We could approximate a nonlinear model by a linear model on a small interval.

Nonlinear Models, Cont.

However, we will sometimes need to consider nonlinear models because

Many models cannot be made linear using a trans- formation:

Yi = αX 1 βi^1 X 2 βi^2 + ≤i.

We don’t wish to re-parameterize because there is a “natural” or “theoretical” parameterization. For example, the βs make physical sense, however, the transformed βs do not.
Some models are not well approximated by any simple linear model. For instance, Yi = α/{1 + exp(−β 0 − β 1 Xi)}.

0.0 0.2 0.4 0.6 0.8 1.

Logistic Curve

We cannot approximate this with a polynomial, as they go to ±∞.
No simple linear model fits “well”.

Fitting Nonlinear Regression Models

Least Squares:

The LS estimate of θ, ˆθ, is the set of parameters that minimizes the sum of squared residuals:

SSR(ˆθ) =

∑^ n

{Yi − f (x′ i; ˆθ)}^2.

We could equivalently consider this from a matrix perspective by writing

SSR(ˆθ) = [Y − f (ˆθ)]′[Y − f (ˆθ)],

where f (ˆθ) = [f (x′ 1 ; ˆθ), f (x′ 2 ; ˆθ), · · · , f (x′ n; ˆθ)]′^ and Y = [Y 1 , Y 2 , · · · Yn]′.

To obtain the normal equations we consider the partial derivatives of SSR(ˆθ) with respect to each θˆj, and set them equal to zero. This gives us a system of p equations. Each normal equation is given by

∂SSR(ˆθ) ∂θˆj

∑^ n

{Yi − f (x′ i; ˆθ)}

[

∂f (x′ i; ˆθ) ∂ˆθj

]

Unfortunately, for nonlinear models, the partial deriva- tives are functions of the parameters. Thus, the normal equations are nonlinear and an explicit so- lution for ˆθ cannot be obtained.

Simple Example

Exponential Growth Model:

Recall that this model has the form

Yi = θ 1 exp(θ 2 ti) + ≤i.

Then, the derivatives are

∂f ∂θ 1

= exp(θ 2 ti) and

∂f ∂θ 2

= θ 1 ti exp(θ 2 ti).

So, the normal equations are

∑^ n

[Yi − θˆ 1 exp(ˆθ 2 ti)][exp(ˆθ 2 ti)] = 0, and

∑^ n

[Yi − θˆ 1 exp(ˆθ 2 ti)][ˆθ 1 ti exp(ˆθ 2 ti)] = 0.

Even for this simple example, there is no analytic solu- tion for ˆθ. We will have to consider numerical methods.

Numerical Methods for Solving Nonlinear LS:

Grid Search:

Consider a “grid” of possible parameter values and find those that minimize the SSR.
Use progressively finer grids to achieve accuracy.
Inefficient and not practical when p large.

Gauss-Newton Algorithm, Cont.

We could then solve for δ = θ − θˆ(0)^ using ordinary least squares: ˆδ(j)^ = {F (ˆθ(j−1))′F (ˆθ(j−1))}−^1 F (ˆθ(j−1))′[Y − f (ˆθ(j−1))].
Finally, since δ = θ − ˆθ(0), we could solve for ˆθ(1)^ = θˆ(0)^ + ˆδ(1). We can view this as finding ˆθ(1), the new estimate of θ.
This new estimate, ˆθ(1), can then be used to find another new estimate, ˆθ(2), and so on.

Let’s find the values of f (ˆθ(0)) and F (ˆθ(0)) for the func-

tion f (xi; θ 1 , θ 2 ) = θ 1 exp(θ 2 xi), where there are three observations, (x 1 , y 1 ), (x 2 , y 2 ), and, (x 3 , y 3 ).

Gauss-Newton Algorithm, Cont.

The Gauss-Newton Algorithm proceeds according to the following steps:

Obtain an initial estimate.
Perform the Taylor series expansion, and calculate f (ˆθ(j)) and F (ˆθ(j)).
Use OLS to find ˆδ(j+1), the adjustment to the cur- rent estimate.
Find a new estimate of θ. Use this estimate to return to step 2.
Continue until

(a) There is only a minor change in the objective function,

|SSR(ˆθ(j+1)) − SSR(ˆθ(j))| SSR(ˆθ(j))

< γ 1 , and

(b) There is only a minor change in the parameter estimates,

|θˆ(j+1)^ − ˆθ(j)| < γ 2.

Modified Gauss-Newton Algorithm

We can avoid overstepping using the Modified Gauss- Newton Algorithm. In this case, we will define a new proposal for θ as

θˆ(j+1)^ = ˆθ(j)^ + αjˆδ(j+1), 0 < αj ≤ 1.

Here, the value of αj is used to modify the step length. There are many different ways that we could choose αj. The simplest is

Step Halving: Here, we will set

ˆθ(j+1)^ = ˆθ(j)^ +^1 2 k^

ˆδ(j+1),

where k is the smallest non-negative integer such that

SSR(ˆθ(j)^ +

2 k^

ˆδ(j+1)) < SSR(ˆθ(j)).

That is, try ˆδ(j+1), then ˆδ(j+1)/2, ˆδ(j+1)/4, etc.

SAS PROC NLIN uses this procedure for k = 0, 1 , · · · , 10.

Other Algorithms:

In general, convergence algorithms have the form

θˆ(j+1)^ = ˆθ(j)^ + αjAj^ ∂Q(ˆθ

(j))

∂θ

where Aj is some positive definite matrix, ∂Q(ˆθ(j))/∂θ is the gradient, based upon some function Q, which is typically the SSR.

Other Algorithms, Cont.

The modified Gauss-Newton algorithm satisfies this form with

θˆ(j+1)^ = ˆθ(j)^ − αj[F (ˆθ(j))′F (ˆθ(j))]−^1 ∂Q(ˆθ

(j))

∂θ

where here, Q = SSR.

Steepest Descent:

The steepest descent algorithm proceeds according to

ˆθ(j+1)^ = ˆθ(j)^ + αjIp×p^ ∂Q(ˆθ

(j))

∂θ

This algorithm is slow to converge, but it moves rapidly initially. Although it is seldom used anymore, it can be useful when starting values are not well known.

Levenberg-Marquardt:

The Levenberg-Marquardt algorithm has the form

θˆ(j+1)^ = ˆθ(j)^ − αj

[

F (ˆθ(j))′F (ˆθ(j)) + τ Ip×p

]− (^1) ∂Q(ˆθ(j))

∂θ

This method is a compromise between the Gauss-Newton and the steepest descent methods. It performs best when F (ˆθ(j))′F (ˆθ(j)) is nearly singular (F (ˆθ(j)) is not of full rank). This is essentially a ridge regression.

How do we choose τ? SAS starts with τ = 10−^3. If SSR(ˆθ(j+1)) < SSR(ˆθ(j)), then τ = τ /10 for the next iteration, otherwise, τ = 10τ.

Other Algorithms, Cont.

Quasi-Newton:

For the Quasi-Newton method, we will update θ accord- ing to

ˆθ(j+1)^ = ˆθ(j)^ − αjH j−^1 ∂Q(ˆθ

(j))

∂θ

where H is a positive definite approximation of the the Hessian, which gets closer as j → ∞. Hj is computed iteratively, and this method is best among first-order methods (only first derivatives are required).

Derivative Free Methods:

Ex. Secant Method: This is like the Gauss-Newton method, but it calculates the derivatives numerically, from past iterations.

These methods generally work well.

Practical Considerations:

There are several practical considerations that you must be aware of when using these techniques.

Failure to converge:
- SSR(θ) may be “flat” in a neighborhood of the minimum.

Practical Considerations, Cont.

Failure to Converge:
- Using better starting values can alleviate this problem.
- This may also suggest that the model is too complex for the data, and a simpler model should be considered.
Convergence to a Local Minimum:
- Linear least squares has the property that SSR(θ) = (Y −Xβ)′(Y −Xβ), which is quadratic and there- fore has a unique minimum.
- Nonlinear least squares need not have a unique minimum.
- Using multiple starting values can identify this problem if it exists.
- If the dimension of θ is low, graph SSR(θ) as a function of θi.
Starting Values:
- To converge, algorithms need good initial esti- mates.
- Prior or theoretical information can be used.

Practical Considerations, Cont.

Starting Values, Cont.
- Model interpretation, cont. Furthermore, when x = 0, then Y = θ 0 /{1 + exp(−θ 1 )}. Thus, an estimate of θ 1 can be obtained by solving

θˆ 0 (0) 1 + exp(−θˆ(0) 1 )

= Y (0).

Finally, notice that Y takes on the value θ 0 / 2 when x = −θ 1 /θ 2. Thus, we can find the x value which corresponds to the “average” Y , and use it to solve

θˆ(0) 1 θ^ ˆ(0) 2

= x.

We could then use ˆθ 0 (0) , ˆθ 1 (0) , and ˆθ(0) 2 as starting values.

Constrained Parameters: often we have constraints on parameters like θi > a or a < θi < b. - Try fitting the model ignoring the constraints and see if the converged parameter estimates satisfy the constraints. If the results satisfy the constraints, no further action is needed.

Practical Considerations, Cont.

Constrained Parameters, cont.
- Try re-parameterizing: for example, for the con- straint θi > a we could write

θi = a + θi − a = a + exp{log(θi − a)} = a + exp(αi). Here, −∞ < αi < ∞ and this parametrization is unconstrained. Or, for the constraint a < θi < b we could write

θi = a +

b − a 1 + exp(−γi)

Again, −∞ < γi < ∞ and this parametrization is unconstrained.

Estimation of σ^2. By analogy to the linear case, we will use

s^2 =

∑n i=1[Yi^ −^ f^ (xi;^ θ)]

n − p

where p is the number of parameters in θ (not the number of regressors in xi).

Linear Functions of Parameters, Cont.

Using this statistic, we can form an approximate 100(1− α)% confidence interval for a′θ as

a′θˆ ± t(1−α/ 2 ,n−p)s(a′[F (θ)′F (θ)]−^1 a)^1 /^2.

To evaluate this interval, we can substitute ˆθ into F (θ).

As a specific case, suppose that a′^ = (0, 0 , · · · , 0 , 1 , 0 , · · · , 0) where the jth element of a is 1 and all other elements are 0. Then, a confidence interval for the jth element of θ is

θˆj ± t(1−α/ 2 ,n−p)s

ˆcjj,

where ˆcjj^ is the jth diagonal element of [F (θ)′F (θ)]−^1.

Nonlinear Functions of the Parameters:

More often, we are interested in nonlinear functions of θ. Suppose that h(θ) is some nonlinear function of the parameters. Note:

h(ˆθ) ≈ h(θ) + h′(ˆθ − θ),

where h = (∂h/∂θ 1 , · · · , ∂h/∂θp)′.

Then,

E{h(ˆθ)} ≈ h(θ) and var{h(ˆθ)} ≈ σ^2 h′[F (θ)′F (θ)]−^1 h.

Thus, h(ˆθ) ∼ AN(h(θ), σ^2 h′[F (θ)′F (θ)]−^1 h) and an ap- proximate 100(1 − α)% confidence interval for h(θ) is

h(ˆθ) ± t(1−α/ 2 ,n−p)s(h′[F (θ)′F (θ)]−^1 h)^1 /^2.

Confidence Intervals for Prediction

Suppose that we are interested in a prediction interval for Y at x = x 0. Notice

y 0 = f (x 0 ; θ) + ≤ 0 , ≤ 0 ∼ N(0, σ^2 ), and ˆy 0 = f (x 0 ; ˆθ).

When n is large, ˆθ is close to θ, so we will use the Taylor series expansion

f (x 0 , θˆ) ≈ f (x 0 , θ) + f 0 (θ)′(ˆθ − θ),

where f 0 (θ) = (∂f (x 0 , θ)/∂θ 1 , · · · , ∂f (x 0 , θ)/∂θp)′. Then,

y 0 − ˆy 0 ≈ y 0 − f (x 0 , θ) − f 0 (θ)′(ˆθ − θ) = ≤ 0 − f 0 (θ)′(ˆθ − θ).

From this form, we can see that

E(y 0 − ˆy 0 ) ≈ E(≤ 0 ) − f 0 (θ)′E(ˆθ − θ) = 0,

and

var(y 0 − yˆ 0 ) ≈ var(≤ 0 ) − var{f 0 (θ)′(ˆθ − θ)} = σ^2 + σ^2 f 0 (θ)′[F (θ)′F (θ)]−^1 f 0 (θ) = σ^2 {1 + f 0 (θ)′[F (θ)′F (θ)]−^1 f 0 (θ)}

Thus, y 0 − yˆ 0 ∼ AN(0, σ^2 {1 + f 0 (θ)′[F (θ)′F (θ)]−^1 f 0 (θ)}).

Now, since s is asymptotically independent of θˆ (and thus of y 0 − ˆy 0 ),

Non Linear Regression - Data Analysis II - Notes | STAT 8320, Study notes of Statistics

Related documents

Partial preview of the text

Download Non Linear Regression - Data Analysis II - Notes | STAT 8320 and more Study notes Statistics in PDF only on Docsity!

Non-Linear Regression

Nonlinear Models, Cont.

Fitting Nonlinear Regression Models

[

]

Simple Example

Gauss-Newton Algorithm, Cont.

Gauss-Newton Algorithm, Cont.

Modified Gauss-Newton Algorithm

Other Algorithms, Cont.

[

Other Algorithms, Cont.

Practical Considerations, Cont.

Practical Considerations, Cont.

= Y (0).

Practical Considerations, Cont.

Linear Functions of Parameters, Cont.

Confidence Intervals for Prediction