Docsity
Docsity

Prepare-se para as provas
Prepare-se para as provas

Estude fácil! Tem muito documento disponível na Docsity


Ganhe pontos para baixar
Ganhe pontos para baixar

Ganhe pontos ajudando outros esrudantes ou compre um plano Premium


Guias e Dicas
Guias e Dicas


Equações de Curvas: Regressão Linear e Não-Linear em SAS, Notas de estudo de Engenharia Florestal

Este documento fornece informações sobre como estimar modelos de curvas usando o sas, incluindo as equações, derivadas e formas linearizadas. Discutimos o método de menos quadrados e máximo likelihood, além de fornecer exemplos de programas sas. O documento também aborda a convergência e como determinar os parâmetros iniciais.

Tipologia: Notas de estudo

2012

Compartilhado em 27/03/2012

jadson-coelho-de-abreu-12
jadson-coelho-de-abreu-12 🇧🇷

5

(6)

12 documentos

1 / 116

Toggle sidebar

Esta página não é visível na pré-visualização

Não perca as partes importantes!

bg1
BIOMETRICS
INFORMATION
HANDBOOK NO. 4 MARCH 1994
Catalog of Curves
for Curve Fitting
Biometrics Information Handbook Series
Province of British Columbia
Ministry of Forests
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51
pf52
pf53
pf54
pf55
pf56
pf57
pf58
pf59
pf5a
pf5b
pf5c
pf5d
pf5e
pf5f
pf60
pf61
pf62
pf63
pf64

Pré-visualização parcial do texto

Baixe Equações de Curvas: Regressão Linear e Não-Linear em SAS e outras Notas de estudo em PDF para Engenharia Florestal, somente na Docsity!

BIOMETRICS

INFORMATION

H ANDBOOK NO. 4 MARCH 1994

Catalog of Curves

for Curve Fitting

Biometrics Information Handbook Series

Province of British Columbia Ministry of Forests

CATALOGUE OF CURVES

FOR CURVE FITTING

by Vera Sit Melanie Poulin-Costello

Series Editors Wendy Bergerud Vera Sit

Ministry of Forests Research Program

iii

ACKNOWLEDGEMENTS

The authors would like to thank all the reviewers for their valuable comments. Special thanks go to Jeff Stone, Jim Goudie, and Gordon Nigh for their suggestions to make this handbook more comprehensive. We also want to thank Amanda Nemec and Hong Gao for checking all the derivatives and graphs in this handbook.

v

TABLE OF CONTENTS

1 INTRODUCTION

This handbook is a collection of linear and non-linear models for fitting experimental data. It is intended to help researchers fit appropriate curves to their data.

Curve fitting, also known as regression analysis, is a common technique for modelling data. The simplest use of a regression model is to summarize the observed relationships in a particular set of data. More importantly, regression models are developed to describe the physical, chemical and biological processes in a system (Rawlings 1988).

This handbook is organized in the following manner:

  • Section 2 defines the terminology and briefly describes the general idea of regression.
  • Section 3 discusses the use of the SAS procedures PROC REG and PROC NLIN for linear and non- linear curve fitting.
  • Section 4 presents eight classes of curves frequently used for modelling data. Each class contains several curves which are described in detail. For each curve, the equation, the derivatives, and the linearized form of the equation are provided, as well as sample plots and SAS programs for fitting the curve.
  • Section 5 explains how to use this handbook for curve fitting. Some strategies for selecting starting values and the concept of convergence are also discussed. Two examples are given to illustrate the curve-fitting procedures.
  • Section 6 provides a brief introduction to various basic attributes of curves. Also included are the corresponding algebra and calculus for identifying these attributes. A strong statistical or calculus background is not required to use this handbook. Although the discussions and examples are based on SAS programs and assume some knowledge of programming, they should be of general interest for anyone fitting curves.

This handbook does not provide an in-depth discussion of regression analysis. Rather, it should be used in conjunction with a reliable text on linear and non-linear regression such as Ratkowsky (1983) or Rawlings (1988).

2 REGRESSION ANALYSIS

In regression, a model or a mathematical equation is developed to describe the behaviour of a variable of interest. The variable may be the growth rate of a tree, body weight of a grizzly bear, abundance of fish, or percent cover of vegetation. This is called the dependent variable and is denoted by Y. Other variables that are thought to provide information on the behaviour of the dependent variable are incorporated into the equation as explanatory variables. These variables (e.g., seedling diameter, chest size of a grizzly bear, volume of cover provided by large woody debris in streams, or light intensity) are called independent variables and are denoted by X. They are assumed to be known without error.

In addition to the independent variables, a regression equation also involves unknown coefficients called parameters that define the behaviour of the model. Parameters are denoted by lowercase letters (a, b, c, etc). A response curve (or ‘‘response surface’’, when many independent variables are involved) represents the true relationship between the dependent and independent variables. In regression analysis, a model is developed to approximate this response curve — a process that estimates parameters from the available data. A regression model has the general form:

Y = ƒ (X) + E

where ƒ (X) is the expected model and E is the error term. For simplicity, the error term E is omitted in subsequent sections.

A linear model is one that is linear in the parameters — that is, each term in the model contains only one parameter which is a multiplicative constant of the independent variable (Rawlings 1988, p. 374). All other models are non-linear. For example:

Y = a X + b

and Y = c X^2 + d X + g

are linear models with parameters a and b , and c , d , and g , respectively. But:

Y = a Xb

and Y = a sin(bX)

are non-linear models.

Some non-linear models are intrinsically linear. This means they can be made linear with an appropriate transformation. For example, the model:

Y = a e bx

is intrinsically linear. It can be converted to linear form with the natural logarithm (ln) transform:

ln(Y) = ln(a) + b X

However, some non-linear models cannot be linearized. The model:

Y = sin(bX)

is an example.

Linear models are very restrictive and are usually used for first-order approximations to true relationships. On the other hand, non-linear models such as the inverse polynomial models, exponential growth models, logistic model and Gompertz model, are often more realistic and flexible. In some cases, non-linear models will have fewer parameters to be estimated and are therefore simpler than linear models.

Once a curve or model has been fitted (i.e., the parameters are estimated) to a set of data, it can be used to estimate Y for each value of X. The deviation or difference of the observed Y from its estimate is called a residual. It is a measure of the agreement between the model and the data.

The parameters in a model can be estimated by the least squares method or the maximum likelihood method. With the least squares method the fitted model should have the smallest possible sum of squared residuals, while with the maximum likelihood method the estimated parameters should maximize the likelihood of obtaining the particular sample. Under the usual assumptions that the residuals are independent with zero mean and common variance σ^2 , the least squares estimators will be the best (minimum variance) among all possible linear unbiased estimators. When the normality assumption is satisfied, least squares estimators are also maximum likelihood estimators.

3 CURVE FITTING WITH SAS

Two SAS procedures are available for curve fitting — PROC REG and PROC NLIN. PROC REG is suitable for

fitting linear models whereas PROC NLIN is used for fitting non-linear models. Intrinsically linear models can

also be fitted with PROC REG provided that the appropriate transformations are known. Both SAS procedures

use the least squares method for estimating the model parameters.

or Z = c + b U

where Z = ln(Y), c = ln(a), and U = ln(X). Before using PROC REG, the natural log transformation of Y and X

must be performed in a data step. If the variables Y and X were stored in a SAS data set called OLD, then the following SAS code^1 could be used to do the regression:

DATA NEW;

SET OLD;

Z = LOG(Y); U = LOG(X);

PROC REG DATA = NEW;

MODEL Z = U;

RUN;

Once the regression is completed, the fitted parameters can be transformed to the parameters in the non- linear model. Continuing with the example, the parameter a in the non-linear model can be obtained from the intercept of the fitted linear model with the exponential function:

a = e c

Parameter b has the same value in both the linear and non-linear model.

When a linearized model is fitted with PROC REG, the error term is also transformed. In this example, the

fitted regression model is:

ln(Y) = ln(a) + b ln(X) + E (1)

The error term E in this model is additive. When the model is transformed back to its non-linear form, it becomes:

Y = a X b^ ⋅ e E^ (2)

That is, the actual model fitted has a multiplicative error term. The SAS procedure PROC NLIN, however, fits

the model:

Y = a X b^ + E (3)

If E in the linear model (1) is normally distributed, then e E^ in the non-linear model (2) is log-normal distributed, which is different from the normal distribution assumed in the non-linear model (3). Because of the different error structures in models (2) and (3), different parameter estimates may result. An example is given in Section 5.6.

In this handbook, the linearized forms, the relationships between parameters in the linear and non-linear forms, and the SAS programs for fitting the linearized models are provided for all intrinsically linear models.

3.3 Non-linear Regression Using PROC NLIN

Non-linear models are more difficult to specify and estimate than linear models. In SAS, they can be fitted with

the procedure NLIN. To use PROC NLIN, one must write the regression expression, declare parameter

names, guess the starting values of the parameters, and possibly specify the derivatives of the model with

respect to the parameters. PROC NLIN is an iterative procedure with five methods available as options:

  • gradient method
  • Newton method

(^1) In SAS, the function LOG is the natural logarithm (ln). Logarithm to the base 10 can be requested by the SAS function LOG10.

  • modified Gauss-Newton method
  • Marquardt method
  • multivariate secant or false position (DUD) method All methods except DUD (Does not Use Derivatives) require a partial derivative of the model with respect to each parameter (i.e., ∂Y/∂a, ∂Y/∂b, etc.) The Newton method also requires the second derivatives of the model with respect to each parameter (i.e., ∂^2 Y/∂a^2 , ∂^2 Y/∂b^2 , etc.).

When derivatives are provided, the Gauss-Newton is the default method of estimation; otherwise, DUD is the default method. Of the two default methods, the Gauss-Newton method is fastest.

To demonstrate the structure of a PROC NLIN step, let us suppose that the model Y = a X b^ is to be fitted to

a set of data. The following is an example to carry out the regression:

PROC NLIN DATA = POINT;

PARAMETER A = 1.0 B = 0.5;

MODEL Y = A * X ** B;

DER.A = X ** B;

DER.B = A * LOG(X) * X ** B;

RUN;

The PROC NLIN statement invokes the SAS procedure. It has several options.

DATA = SASdataset;

names the SAS data set to be analyzed by PROC NLIN. If the DATA = option is omitted then the most recently

created SAS data set is used.

METHOD = iteration method;

specifies the iterative method NLIN uses. If the METHOD = option is omitted and the DER statement is

specified, then METHOD = GAUSS is used. If the METHOD = option is not specified and the DER statement is

absent, then METHOD = DUD is used.

Other options are available to control the iteration process. See Chapter 23 of the SAS/STAT User’s Guide (SAS Institute Inc. 1988a) for more detail.

The PARAMETER statement defines the parameters and their starting values. It has the form:

PARAMETER parameter = starting values... ;

A range of starting values may also be specified with this statement. For example:

PARAMETER A = 0 TO 15 BY 5;

specifies four starting values (0, 5, 10, and 15) for A.

PARAMETER A = 1, 7, 9;

specifies three starting values (1, 7, and 9).

The PARAMETER statement must follow the PROC NLIN statement. See chapter 23 of the SAS/STAT User’s

Guide (SAS Institute Inc. 1988a) for more details.

The MODEL statement defines the equation to be fitted. It has the form:

MODEL dependent variable = expression;

The expression can be any valid SAS expression.

The functional form is the equation of the model. It is stated as:

Y = ƒ(X)

Standard functional names are used if possible. Otherwise, the functions are named sequentially as Type I or

Type II. The first derivative of Y with respect to X ( i.e. dYdX^ )is provided for determining special features of

a curve such as the maximum, minimum, and point of inflection. These features and the technique to identify

them are discussed in Section 6. The partial derivatives of Y with respect to model parameters ( e.g. ∂ ∂Ya ,∂∂Yb^ )

are also given as they are required for PROC NLIN (but not for the derivative-free [DUD] method).

For intrinsically linear models, the linearized form and the relationship between the parameters in the linear and non-linear forms are supplied. Also included is a short description of the curve and the roles the parameters play in determining the behaviour of the curve. Sample SAS programs for doing linear or non- linear regression are provided for readers’ convenience. To use these programs, the X’s and Y’s should be replaced by the appropriate variable names. In addition, appropriate starting values must be substituted if

PROC NLIN is to be used (the values in the sample programs are arbitrary).

Finally, a number of graphs are presented for each curve. Each graph shows the impact of changing a parameter on the curve when other parameters are kept constant. These graphs are helpful for selecting models and parameter starting values.

The domain of a function is the set of X-values for which the function is defined. For most of the models presented in this section, the domain is the set of real numbers: − ∞ < X < ∞.

4.1 Polynomials

A polynomial of degree n has the form:

Y = c 0 + c 1 X + c 2 X^2 + … + cn − 1 Xn^ −^1 + c (^) n Xn

where the c (^) i’s are real numbers. They are the parameters , or coefficients , of the X terms. For example:

Y = 1 + 5X + 2X 2 (1)

is a polynomial of degree 2 with parameters c 0 = 1, c 1 = 5, and c 2 = 2. The equation:

Y = 4X 3 (2)

is a polynomial of degree 3 with parameters c 0 = c 1 = c 2 = 0 and c 3 = 4.

Polynomials are unbounded — that is, as X increases indefinitely, the function Y increases or decreases without limit.

A criterion for selecting a functional form is the number of extremums (i.e., maximum or minimum or both). A polynomial of degree n has at most n-1 extremums. For example, equation (1) is a polynomial of degree 2 with one minimum; equation (2) is a polynomial of degree 3 with no extremum. A polynomial of degree 1 is a straight line with no extremum.

A polynomial is linear with respect to its parameters, and therefore can be fitted with PROC REG.

Polynomials are a common choice for curve fitting because they are simple to use. Lower degree polynomials are easier to interpret than higher degree polynomials. For this reason, only polynomials of degree one, two, and three are included here. Although polynomials can provide a very good fit to data, they extrapolate poorly and should be used with caution.

Any continuous response function can be approximated to any level of precision desired by a polynomial of appropriate degree. Because of this flexibility, it is easy to ‘‘overfit’’ a set of data with polynomial models. Thus, an excellent fit of a polynomial model (or for that matter, any model) cannot be interpreted as an indication that it is, in fact, the true model.

4.1.1 First degree polynomial: linear

Functional form: Y = a + bX − ∞ < X < ∞

Derivatives: dY dX

= b

∂Y ∂Y

∂a ∂b

= 1 = X

Linearized model and parameters: This functional form is already linear.

Description: This is the equation of a straight line. Parameter a is the Y-intercept and it controls the vertical position of the line. Parameter b is the slope of the line.

Sample PROC REG program:

PROC REG DATA=LINEAR;

MODEL Y = X;

RUN;

PROC NLIN is unnecessary because this is a linear model.

  • 1 INTRODUCTION ACKNOWLEDGEMENTS iii
  • 2 REGRESSION ANALYSIS
  • 3 CURVE FITTING WITH SAS
    • 3.1 Linear Regression using PROC REG
    • 3.2 Non-linear Regression using PROC REG
    • 3.3 Non-linear Regression using PROC NLIN
  • 4 A CATALOG OF CURVES
    • 4.1 Polynomials
      • 4.1.1 First degree polynomial: linear
      • 4.1.2 Second degree polynomial: quadratic
      • 4.1.3 Third degree polynomial: cubic
    • 4.2 Inverse Polynomials
      • 4.2.1 First degree inverse polynomial: hyperbola
      • 4.2.2 Second degree inverse polynomial: inverse quadratic
      • 4.2.3 Third degree inverse polynomial: inverse cubic
      • 4.2.4 Rational function
      • 4.2.5 Mixed type function
    • 4.3 Exponential Functions
      • 4.3.1 Type I exponential function
      • 4.3.2 Type II exponential function
      • 4.3.3 Type III exponential function
      • 4.3.4 Type IV exponential function
      • 4.3.5 Type V exponential function
      • 4.3.6 Type VI exponential function
      • 4.3.7 Schumacher’s equation
      • 4.3.8 Modified Weibull equation
      • 4.3.9 Chapman-Richard’s equation
      • 4.3.10 Generalized logistic function
      • 4.3.11 Logistic function
      • 4.3.12 Gompertz function
      • 4.3.13 Schnute’s equation
    • 4.4 Power Function
    • 4.5 Combined Exponential and Power Functions
      • 4.5.1 Type I combined exponential and power function
      • 4.5.2 Type II combined exponential and power function
      • 4.5.3 Generalized Poisson function
    • 4.6 Logarithmic Function
    • 4.7 Trigonometric Functions
      • 4.7.1 Cosine function
      • 4.7.2 Sine function
      • 4.7.3 Arctangent function
    • 4.8 Common Distributions
      • 4.8.1 Normal distribution
      • 4.8.2 Student-t distribution
      • 4.8.3 FisherF distribution vi
      • 4.8.4 Chi-square distribution
  • 5 CURVE FITTING METHODS
    • 5.1 How to Select Models
    • 5.2 How to Choose Starting Values for PROC NLIN
    • 5.3 What is Convergence?
    • 5.4 Model Comparisons and Model Validation
    • 5.5 Example
      • 5.5.1 Fitting the model using PROC REG
      • 5.5.2 Fitting the model using PROC NLIN
    • 5.6 Example
      • 5.6.1 Fitting the model using PROC REG
      • 5.6.2 Fitting the model using PROC NLIN
      • 5.6.3 Statistical consideration for a ‘‘better’’ model
      • 5.6.4 Subjective consideration for a ‘‘better’’ model
  • 6 BASIC ATTRIBUTES OF CURVES
    • 6.1 Increasing and Decreasing Functions
    • 6.2 Symmetric Functions
    • 6.3 Asymptotes
    • 6.4 Concavity of a Function
    • 6.5 Maximum and Minimum
    • 6.6 Point of Inflection
  • APPENDIX 1 Gamma function
  • Bibliography
    1. Lima bean yield data TABLES
    1. San Diego population data from 1860 to
    1. Lima bean yield versus harvest dates FIGURES
    1. San Diego population versus decades from first census
    1. Fitted models and the observed data
  • 4a. An increasing function
  • 4b. A decreasing function
    1. A symmetric function
    1. A function with asymptotes
  • 7a. A concave upward function
  • 7b. A concave downward function
  • 8a. A function with a local maximum
  • 8b. A function with a local minimum
    1. The graph of Y=8X^5 -5X^4 -20X^3 showing the local extreniums and inflection points
  • Functional form: Y = a + bX X >

Functional form: Y = A(X − B) 2 + C

4.1.3 Third degree polynomial: cubic

Functional form: Y = a + bX + cX 2 + dX 3 − ∞ < X < ∞

Derivatives: dY dX

= b + 2cX + 3dX 2

∂Y ∂Y ∂Y ∂Y

∂a ∂b ∂c ∂d

= 1 = X = X 2 = X 3

Linearized model and parameters: This functional form is already linear.

Description: The basic shape of a cubic is a sideways ‘‘S’’ ( ). Parameter d shifts the curve up and down the Y-axis. The parameters a , b , and c work together to make the S-shape flatter or deeper. While the above form is linear, the role of the parameters is more clearly understood in the alternative form:

Y = A (X − B) (X − C) (^) ( X^ −^

B + C

2 )^ +^ D

The parameters in the two forms are related as follows:

a = D − ABC (^) ( B^ +^ C^ ) b = A (B^ +^ C)^

2

  • ABC 2 2

c = −^3 A (B + C) d = A 2

Also A = d B = (^1) [ − 2c^ + √

4c^2 − 4 ( b^ −^

2c 2 2 3d 9d^2 d 9d ) ]

C = (^1) [ − 2c^ − √

4c^2 − 4 ( b^ −^

2c 2 2 3d 9d^2 d 9d) ]

D = a − c^ ( b − 2c^

2 3d 9d)

In the non-linear form of the cubic, parameter A scales the curve in the Y direction and D shifts the curve in the Y direction. Parameters B and C have the same effects on the shape of the curve; together they shift and squeeze the curve in the X direction. As parameters B and C are equivalent, only the graphs that show the effect of varying B are presented in the following pages.

Sample PROC REG program for the functional form:

PROC REG DATA = CUBIC;

MODEL Y = X X2 X3;

* X2 = XX AND X3 = XX*X must be created in a previous data step;

RUN;

Sample PROC NLIN program for the alternative form:

PROC NLIN DATA = CUBIC2;

PARAMETER A= 0.8 B=-0.5 C=-0.4 D=2;

XB=X-B; XC=X-C; BC2=(B+C)/2;

MODEL Y = AXBXC*(X-BC2)+D;

DER.A = XB*XC((X-BC2);

DER.B = AXC(B+0.5C-1.5X);

DER.C = AXB(C+0.5B-1.5X);

DER.D = 1;

RUN;