Least Squares Approximation and Linear Regression: Finding the Best Approximate Solution, Study notes of Linear Algebra

The least squares approximation theorem and its proof. It explains how to find the best approximate solution of a matrix equation ax = b by projecting b onto the column space w of a and finding the solution x of ax = projwb. The document also introduces the concept of approximating functions and finding their best approximation as a linear combination of known functions. An example of finding the quadratic function that best fits a set of data points is provided.

Typology: Study notes

Pre 2010

Uploaded on 04/12/2010

koofers-user-50g
koofers-user-50g 🇺🇸

5

(2)

9 documents

1 / 1

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Math 311 Lecture 22
Least squares approximation
NOTE. For column vectors u,v: the dot product vu = the
matrix product vTu.
Let A be a matrix.
Let W = the column space of A = the space spanned by
the columns of A.
THEOREM. AX = b has a solution iff b is in the column
space of A.
PROOF. Suppose v1, v2, ..., vn are the columns of A and
suppose X = [x1, x2, ..., xn]T. WA = (v1 | v2 | ... | vn) and
AX = (v1 | v2 | ... | vn)[x1, x2, ..., xn]T = x1v1+ x2v2 + ... + xnvn.
Hence b = AX
iff b = x1v1+ x2v2 + ... + xnvn
iff b is a linear combination of the columns of A
iff b is in the column space of A.
Suppose b is not in the column space W of A. Thus AX = b
has no solution.
How close can one come to a solution? I.e., what X gives
a value AX which is the closest possible to b?
Since the vectors AX are exactly the vectors of the
column space W of A, this is the same as asking which
vector in W is the closest to b. The answer is projWb,
the projection of b onto W.
Since projWb LW, AX = projWb does have a solution X.
This X is the least-squares solution, it is the best
approximate solution of AX = b.
We could find the least-squares solution by calculating
projWb and then solving AX = projWb. But there is an
easier way.
THEOREM. If A is an m[n matrix of rank n, the
least-squares solution for AX = b, is the exact solution
to the exact equation (ATA)X = (ATb).
PROOF. Suppose A = (v1 | v2 | ... | vn) and suppose X is the
least-squares solution for AX = b.
X the least-squares solution for AX = b
î AX = projWb(by definition of “least-squares”)
î bAX = bprojWb is 7 to the column space of A.
î bAX is perpendicular to each column vi of A.
î vi(bAX) = 0 for each column vi.
î viT(bAX) = 0 for each column vi.
î .
v1
T
v2
T
...
vn
T
(bAX)=O
î AT(bAX) = O.
î ATb ATAX = O.
î ATAX = ATb. áexact equation E
To solve (ATA)X = (ATb), first find (ATb) and (ATA).
CSuppose b = and A = and X = .
1
1
1
10
01
00
x
y
Find the best approximate solution of AX = b.
Solution: ATb = [1,1]T. (ATA) = I2. Hence
(ATA)X = (ATb) becomes I2 X = [1,1]T. Hence X = [1,1]T.
The error vector of an approximate solution to AX = b is
the difference e = bAX between the desired value b
and approximate value AX found. The least-squares
solution has the smallest error, i.e., ||e|| is minimum.
Approximating functions
Suppose we know the values f(t1), f(t2), f(t3) of an
otherwise unknown function f(t). Suppose we wish to
approximate it as a linear combination
af1(t)+bf2(t)+cf3(t) of three known functions f1, f2, f3.
Thus we wish to find the X = [a, b, c]T such that af1(t)+
bf2(t) + cf3(t) gives the best approximation to f(t) for
the n known values. Thus X=[a, b, c]T is the best
solution to
af1(t1) + bf2(t1) + cf3(t1) = f(t1)
af1(t2) + bf2(t2) + cf3(t2) = f(t2)
af1(t3) + bf2(t3) + cf3(t3) = f(t3)
...
af1(tn) + bfn(t) + cf3(tn) = f(tn)
W
.
f1(t1)f2(t1)f3(t1)
f1(t2)f2(t2)f3(t2)
f1(t3)f2(t3)f3(t3)
...
f1(tn)
...
f2(tn)
...
f3(tn)
a
b
c
=
f(t1)
f(t2)
...
f(tn)
Let A be the first matrix, X = [a, b, c]T and B the last
vector. Hence we are trying to find the best
approximate answer to AX = B. This is the exact
solution to ATAX = ATB.
Once we have X = [a, b, c]T, the approximating function is
af1(t) + bf2(t) + cf3(t) and the error vector e =
[f(t1)
(af1(t1)+bf2(t1)+cf3(t1)),...,f(tn)
(af1(tn)+bf2(tn)+cf3(tn))]
= the differences between f(ti) and af1(ti)+ bf2(ti)+ cf3(ti).
CFind the quadratic function which best fits {(-2,6),
(-1,2), (0,1), (1,2), (2,5)}. Also find the error vector.
Quadratic means . W f1(t) = t2, f2(t) = t, f3(t) = 1.
at2+bt +c
For (-2,6): f1(-2) = 4, f2(-2) = -2, f3(-2) = 1, f(-2) = 6. ...
AX = B and the exact equation ATAX = ATB are
, .
421
111
001
111
421
a
b
c
=
6
2
1
2
5
34 0 10
0100
10 0 5
a
b
c
=
48
2
16
Least-squares solution: [a, b, c]T = [8/7, -1/5, 32/35]T.
Answer: .
e = [.11, -.26, .086, .14, -.086]T
8
7t21
5t+32
35 ||e|| = .34

Partial preview of the text

Download Least Squares Approximation and Linear Regression: Finding the Best Approximate Solution and more Study notes Linear Algebra in PDF only on Docsity!

Math 311 Lecture 22

Least squares approximation NOTE. For column vectors u, v: the dot product v  u = the matrix product v Tu. Let A be a matrix. Let W = the column space of A = the space spanned by the columns of A.

THEOREM. AX = b has a solution iff b is in the column space of A. P ROOF. Suppose v 1 , v 2 , ..., vn are the columns of A and suppose X = [ x 1 , x 2 , ..., x n] T. W A = (v 1 | v 2 | ... | vn) and AX = (v 1 | v 2 | ... | vn)[ x 1 , x 2 , ..., x n] T^ = x 1 v 1 + x 2 v 2 + ... + x nvn. Hence b = AX iff b = x 1 v 1 + x 2 v 2 + ... + x nvn iff b is a linear combination of the columns of A iff b is in the column space of A.

Suppose b is not in the column space W of A. Thus AX = b has no solution. How close can one come to a solution? I.e., what X gives a value AX which is the closest possible to b? Since the vectors AX are exactly the vectors of the column space W of A , this is the same as asking which vector in W is the closest to b. The answer is proj (^) W b , the projection of b onto W. Since projW b LW, AX = projW b does have a solution X. This X is the least-squares solution, it is the best approximate solution of AX = b. We could find the least-squares solution by calculating projW b and then solving AX = projW b. But there is an easier way. THEOREM. If A is an m [ n matrix of rank n , the least-squares solution for AX = b , is the exact solution to the exact equation ( A T A ) X = ( A T b ). P ROOF. Suppose A = (v 1 | v 2 | ... | vn) and suppose X is the least-squares solution for AX = b. X the least-squares solution for AX = b Ó AX = projW b (by definition of “least-squares”) Ó b  AX = b projW b is 7 to the column space of A. Ó b  AX is perpendicular to each column v (^) i of A. Ó v (^) i ( b  AX ) = 0 for each column vi. Ó v (^) iT( b  AX ) = 0 for each column vi.

Ó. 

v 1^ T v 2^ T ... v (^) nT

( bAX ) = O

Ó A T( b  AX ) = O. Ó A T b  A T AX = O. Ó A T AX = A T b. ·exact equation E

To solve ( A T A ) X = ( A T b ), first find ( A T b ) and ( A T A ).

CSuppose b = and A = and X =.



1 1 1





1 0 0 1 0 0

    xy   

Find the best approximate solution of AX = b. Solution: A T b = [1,1] T. ( A T A ) = I 2. Hence ( A T A ) X = ( A T b ) becomes I 2 X = [1,1] T. Hence X = [1,1] T. The error vector of an approximate solution to AX = b is the difference e = b  AX between the desired value b and approximate value AX found. The least-squares solution has the smallest error, i.e., || e || is minimum. Approximating functions Suppose we know the values f( t 1 ), f( t 2 ), f( t 3 ) of an otherwise unknown function f( t ). Suppose we wish to approximate it as a linear combination a f 1 ( t )+ b f 2 ( t )+ c f 3 ( t ) of three known functions f 1 , f 2 , f 3. Thus we wish to find the X = [ a , b , c ] T^ such that a f 1 ( t )+ b f 2 ( t ) + c f 3 ( t ) gives the best approximation to f( t ) for the n known values. Thus X =[ a, b , c ] T^ is the best solution to a f 1 ( t 1 ) + b f 2 ( t 1 ) + c f 3 ( t 1 ) = f( t 1 ) a f 1 ( t 2 ) + b f 2 ( t 2 ) + c f 3 ( t 2 ) = f( t 2 ) a f 1 ( t 3 ) + b f 2 ( t 3 ) + c f 3 ( t 3 ) = f( t 3 ) ... a f 1 (t (^) n) + b fn( t ) + c f 3 ( t n) = f( t n) W

f 1 ( t 1 ) f 2 ( t 1 ) f 3 ( t 1 ) f 1 ( t 2 ) f 2 ( t 2 ) f 3 ( t 2 ) f 1 ( t 3 ) f 2 ( t 3 ) f 3 ( t 3 ) ... f 1 ( t (^) n )

f 2 ( t (^) n )

f 3 ( t (^) n )

a b c

f ( t 1 ) f ( t 2 ) ... f ( t (^) n )

Let A be the first matrix, X = [ a , b , c ] T^ and B the last vector. Hence we are trying to find the best approximate answer to AX = B. This is the exact solution to A T AX = A T B. Once we have X = [ a , b , c ] T, the approximating function is a f 1 ( t ) + b f 2 ( t ) + c f 3 ( t ) and the error vector e = [f(t 1 )(af 1 (t 1 )+bf 2 (t 1 )+cf 3 (t 1 )),...,f(t (^) n )(af 1 (t (^) n )+bf 2 (t (^) n )+cf 3 (t (^) n ))] = the differences between f( t i) and a f 1 ( t i)+ b f 2 ( t i)+ c f 3 ( t i). CFind the quadratic function which best fits {(-2,6), (-1,2), (0,1), (1,2), (2,5)}. Also find the error vector. Quadratic means at^2 + bt + c. W f 1 ( t ) = t^2 , f 2 ( t ) = t , f 3 ( t ) = 1. For (-2,6): f 1 (-2) = 4, f 2 (-2) = -2, f 3 (-2) = 1, f(-2) = 6. ... AX = B and the exact equation A T AX = A T B are

 

4 − 2 1 1 − 1 1 0 0 1 1 1 1 4 2 1

 

  

a b c

  

=

 

6 2 1 2 5

 

  

34 0 10 0 10 0 10 0 5

  

  

a b c

  

=

  

48 − 2 16

  

Least-squares solution: [ a , b , c ] T^ = [8/7, -1/5, 32/35]T. Answer: 87 t^2 − 15 t + 3235. e = [.11, -.26, .086, .14, -.086] T || e || =.