Curve Fitting Techniques - Lecture Notes | CGN 3421, Study notes of Civil Engineering

Material Type: Notes; Class: COMP METH CIVIL ENGR; Subject: CIVIL ENGINEERING; University: University of Florida; Term: Unknown 1989;

Typology: Study notes

Pre 2010

Uploaded on 03/18/2009

koofers-user-ynf-1
koofers-user-ynf-1 🇺🇸

10 documents

1 / 14

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CGN 3421 - Computer Methods Gurley
Numerical Methods Lecture 5 - Curve Fitting Techniques page 89 of 102
Numerical Methods Lecture 5 - Curve Fitting Techniques
Topics motivation
interpolation
linear regression
higher order polynomial form
exponential form
Curve fitting - motivation
For root finding, we used a given function to identify where it crossed zero
where does ??
Q: Where does this given function come from in the first place?
Analytical models of phenomena (e.g. equations from physics)
Create an equation from observed data
1) Interpolation (connect the data-dots)
If data is reliable, we can plot it and connect the dots
This is piece-wise, linear interpolation
This has limited use as a general function
Since its really a group of small s, connecting one point to the next
it doesn’t work very well for data that has built in random error (scatter)
2) Curve fitting - capturing the trend in the data by assigning a single function across the entire range.
The example below uses a straight line function
A straight line is described generically by f(x) = ax + b
The goal is to identify the coefficients ‘a’ and ‘b’ such that f(x) ‘fits’ the data well
!
"() !"
!
"()
!
"()
!
"()
Interpolation Curve Fitting
f(x) = ax + b
f(x) = ax + b
for each line for entire range
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe

Partial preview of the text

Download Curve Fitting Techniques - Lecture Notes | CGN 3421 and more Study notes Civil Engineering in PDF only on Docsity!

Numerical Methods Lecture 5 - Curve Fitting Techniques

Topics motivation interpolation linear regression higher order polynomial form exponential form Curve fitting - motivation For root finding, we used a given function to identify where it crossed zero where does ?? Q: Where does this given function come from in the first place?

  • Analytical models of phenomena (e.g. equations from physics)
  • Create an equation from observed data

1) Interpolation (connect the data-dots) If data is reliable, we can plot it and connect the dots This is piece-wise, linear interpolation This has limited use as a general function Since its really a group of small s, connecting one point to the next it doesn’t work very well for data that has built in random error (scatter)

2) Curve fitting - capturing the trend in the data by assigning a single function across the entire range. The example below uses a straight line function

A straight line is described generically by f(x) = ax + b

The goal is to identify the coefficients ‘a’ and ‘b’ such that f(x) ‘fits’ the data well

Interpolation Curve Fitting

f(x) = ax + bfor each line f(x) = ax + b

for entire range

other examples of data sets that we can fit a function to.

Is a straight line suitable for each of these cases? No. But we’re not stuck with just straight line fits. We’ll start with straight lines, then expand the concept.

Linear curve fitting (linear regression)

Given the general form of a straight line

How can we pick the coefficients that best fits the line to the data?

First question: What makes a particular straight line a ‘good’ fit?

Why does the blue line appear to us to fit the trend better?

  • Consider the distance between the data and points on the line
  • Add up the length of all the red and blue verticle lines
  • This is an expression of the ‘error’ between data and fitted line
  • The one line that provides a minimum error is then the ‘best’ straight line

time

height of

dropped

object

Oxygen in

soil

temperature

soil depth

pore

pressure

Profit

paid labor hours

Solve for the and so that the previous two equations both = 0 re-write these two equations

put these into matrix form

what’s unknown? we have the data points for , so we have all the summation terms in the matrix

so unknows are and Good news, we already know how to solve this problem remember Gaussian elimination ??

, ,

so

using built in Mathcad matrix inversion, the coefficients and are solved

>> X = A -1*B

Note: , , and are not the same as , , and

Let’s test this with an example:

First we find values for all the summation terms

, , ,

Now plugging into the matrix form gives us:

i 1 2 3 4 5 6 0 0.5 1.0 1.5 2.0 2. 0 1.5 3.0 4.5 6.0 7.

(^) ∑ " (^) (^ $^ #$ (^) ∑" (^) ( " ∑( " (^) ( ) (^) ()

(^) ∑ " (^) (# $ 3 * " ∑) (^) (

  • (^) ∑" (^) (

∑ "^ ( ∑"^ (^ $

∑)^ ( ∑(^ "^ ( )^ ()

  • (^) ∑" (^) (

∑ "^ ( ∑"^ (^ $

∑)^ ( ∑(^ "^ ( )^ ()

∑ "^ ( "^748 ∑ )^ ( "^ $$48 ∑ "^ (^ $ "^ %(478 ∑ "^ ( )^ ( " )%4$

Note : we are using , NOT

or use Gaussian elimination...

The solution is ===>

This fits the data exactly. That is, the error is zero. Usually this is not the outcome. Usually we have data that does not exactly fit a straight line. Here’s an example with some ‘noisy’ data

x = [0 .5 1 1.5 2 2.5], y = [-0.4326 -0.1656 3.1253 4.7877 4.8535 8.6909]

, ,

so our fit is

Here’s a plot of the data and the curve fit:

So...what do we do when a straight line is not suitable for the data set?

Curve fitting - higher order polynomials

" (^) ∑ " (^) (^ $ ( (^) ∑" (^) ()$

Profit

paid labor hours

Straight line will not predict diminishing returns that data shows

The general expression for any error using the least squares approach is

where we want to minimize this error. Now substitute the form of our eq. (1)

into the general least squares error eq. (2)

where: - # of data points given, - the current data point being summed, - the polynomial order re-writing eq. (3)

find the best line = minimize the error (squared distance) between line and data points Find the set of coefficients so we can minimize eq. (4)

CALCULUS TIME To minimize eq. (4), take the derivative with respect to each coefficient set each to

zero

%&& " (^) ∑ ( ' (^) ()$"( )% &! "( (^) %))$^ # ( )$ &! "( (^) $))$^ # ( )( &! "( (^) ())$^ #( )) &! "( (^) )))$

! ( )" #! # #%" # #$"$^ # #("(^ # 444 # # / " / #! # 0 " 0

0 "%

/ " " # ∑

%&&  ) ( &  #! # #%" ( # #$" (^ $^ # #(" (^ (^ # 444 ## / " (/^

$

( "%

" ∑

0 "%

/

 #^ ∑ 

( "%

" ∑

0 ,#!

0 "%

/

 #^ ∑ 

( "%

" & (^) ∑ "!

0 "%

/

 #^ ∑ 

( "%

" & (^) ∑ "!

0 "%

/

 #^ ∑ 

( "%

" & (^) ∑ "!

; ; ∂%&& ∂# (^) /

0 "%

/

 #^ ∑ 

( "%

" & (^) ∑ "!

re-write these equations, and put into matrix form

where all summations above are over

what’s unknown?

we have the data points for we want

We already know how to solve this problem. Remember Gaussian elimination ??

, ,

where all summations above are over data points

Note: No matter what the order , we always get equations LINEAR with respect to the coefficients. This means we can use the following solution method

using built in Mathcad matrix inversion, the coefficients and are solved

>> X = A -1*B

  • (^) ∑ " (^) ( ∑ " (^) (^ $ (^444) ∑" (^) (/

∑ "^ ( ∑ "^ (^ $ ∑ "^ (^ (^444 ∑"^ (/^ #%

∑ "^ (^ $ ∑ "^ (^ ( ∑ "^ (^ )^444 ∑"^ (/^ #$ ; ; ; ;

∑ "^ (/ ∑ "^ (/^ #% ∑ "^ (/^ #$^444 ∑"^ (/^ #/

∑)^ ( ∑(^ "^ ( )^ ()

∑ "^ (^ $)^ ( ;

∑ "^ (/^ )^ (

  • (^) ∑ " (^) ( ∑ " (^) (^ $ (^444) ∑" (^) (/

∑ "^ ( ∑ "^ (^ $ ∑ "^ (^ (^444 ∑"^ (/^ #%

∑ "^ (^ $ ∑ "^ (^ ( ∑ "^ (^ )^444 ∑"^ (/^ #$ ; ; ; ;

∑ "^ (/ ∑ "^ (/^ #% ∑ "^ (/^ #$^444 ∑"^ (/^ #/

∑)^ ( ∑(^ "^ ( )^ ()

∑ "^ (^ $)^ ( ;

∑ "^ (/^ )^ (

or use Gaussian elimination gives us the solution to the coefficients

===>

This fits the data exactly. That is, f(x) = y since y = x^

Example #2: uncertain data Now we’ll try some ‘noisy’ data

x = [0 .0 1 1.5 2 2.5] y = [0.0674 -0.9156 1.6253 3.0377 3.3535 7.9409] The resulting system to solve is:

giving:

So our fitted second order function is:

Example #3 : data with three different fits

In this example, we’re not sure which order will fit well, so we try three different polynomial orders Note: Linear regression, or first order curve fitting is just the general polynomial form we just saw, where we use j=1,

  • 2nd and 6th order look similar, but 6th has a ‘squiggle to it. We may not want that...

Overfit / Underfit - picking an inappropriate order

Overfit - over-doing the requirement for the fit to ‘match’ the data trend (order too high)

Polynomials become more ‘squiggly’ as their order increases. A ‘squiggly’ appearance comes from inflections in function

Consideration #1:

3rd order - 1 inflection point 4th order - 2 inflection points nth order - n-2 inflection points

Consideration #2:

2 data points - linear touches each point 3 data points - second order touches each point n data points - n-1 order polynomial will touch each point

SO: Picking an order too high will overfit data

General rule: pick a polynomial form at least several orders lower than the number of data points. Start with linear and add order until trends are matched.

Underfit - If the order is too low to capture obvious trends in the data

General rule: View data first, then select an order that reflects inflections, etc.

For the example above:

  1. Obviously nonlinear, so order > 1
  2. No inflcetion points observed as obvious, so order < 3 is recommended =====> I’d use 2nd order for this data

overfit

Profit

paid labor hours

Straight line will not predict

diminishing returns that data shows

where we will seek C and A such that this equation fits the data as best it can. Again with the error: solve for the coefficients such that the error is minimized:

minimize

Problem: When we take partial derivatives with respect to and set to zero, we get two NONLIN- EAR equations with respect to

So what? We can’t use Gaussian Elimination or the inverse function anymore. Those methods are for LINEAR equations only...

Now what? Solution #1: Nonlinear equation solving methods Remember we used Newton Raphson to solve a single nonlinear equation? (root finding) We can use Newton Raphson to solve a system of nonlinear equations. Is there another way? For the exponential form, yes there is

Solution #2: Linearization: Let’s see if we can do some algebra and change of variables to re-cast this as a linear problem... Given: pair of data (x,y)

Find: a function to fit data of the general exponential form

  1. Take logarithm of both sides to get rid of the exponential

  2. Introduce the following change of variables: , ,

Now we have: which is a LINEAR equation

The original data points in the plane get mapped into the plane.

This is called data linearization. The data is transformed as:

Now we use the method for solving a first order linear curve fit

for and , where above , and

Finally, we operate on to solve

( "%

" ∑

  • (^) ∑,

∑ , ∑,$

∑^2 ∑,

And we now have the coefficients for Example: repeat previous example, add exponential fit

C =1.

A (^) =1.

2 0 2 4

0

100

200

300

data 2nd order exp

C :=exp ( coeff 1 ) A :=coeff 2 fitexp ( )x :=C exp⋅ ( A x⋅ ) i :=− 2 , −1.9.. 4

Y2 (^) :=ln Y( ) fexp (^) :=regress ( X (^) , Y2, 1 ) coeff (^) :=submatrix ( fexp (^) , 4 , 5 , 1 , 1 )

ADDING NEW STUFF FOR EXP FIT

fit2 x( ) (^) :=interp ( f2 (^) , X, Y,x) fit3 x( ) (^) :=interp ( f3 (^) , X, Y,x)

f2 :=regress ( X , Y, 2 ) f3 :=regress ( X , Y, 3 )

X (^) :=Create_Vector ( (^) − 2 , 4 ,.25) Y (^) :=1.6 exp⋅ ( 1.3 X⋅ )