Linear Estimation in Hilbert Spaces: Min. Variance Unbiased Estimation & Kalman Filter - P, Study notes of Computer Science

The topic of linear estimation in hilbert spaces, focusing on minimum variance unbiased estimation and the application of the kalman filter. Motivations, outlines, and examples of least-squares estimation, projection onto a subspace, weighted least-squares, and the gauss-markov theorem. The text also discusses the recursive least-squares algorithm and its connection to hilbert spaces.

Typology: Study notes

Pre 2010

Uploaded on 08/19/2009

koofers-user-5lo
koofers-user-5lo 🇺🇸

10 documents

1 / 28

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Linear Estimation in
Hilbert Spaces
Sridhar Mahadevan
University of Massachusetts
©Sridhar Mahadevan: CMPSCI 689 p.1/28
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c

Partial preview of the text

Download Linear Estimation in Hilbert Spaces: Min. Variance Unbiased Estimation & Kalman Filter - P and more Study notes Computer Science in PDF only on Docsity!

Linear Estimation in

Hilbert Spaces

Sridhar Mahadevan

[email protected]

University of Massachusetts

©

Sridhar Mahadevan: CMPSCI 689 – p.1/

Motivation

We have covered three approaches to parameter estimation

Maximum likelihood Bayesian estimation Least-squares

In complex problems, maximum likelihood and Bayesian estimation are difficult touse since they require knowledge of the full joint distribution. Often, a simplifying assumption that is made is that the variables are Gaussian,where the first order and second order moments fully define the distribution. This assumption coincides with the least squares approach, which forms therichest set of practical applications of statistical estimation. To deal with time-series problems, least squares estimation needs to be modifiedto work incrementally.

©

Sridhar Mahadevan: CMPSCI 689 – p.2/

Motivation

Many real-world applications involve the following problem:

A series of incremental observations are made using some noisymeasurement system. The goal is to continually estimate some underlying parameters, or hiddenstate. The aim is to solve this problem recursively, so that the estimates can beupdated extremely rapidly online. Thousands of real-world applications: tracking a spacecraft, monitoring theUS economy, robot navigation, automatic landing of a 747,

This problem was solved in 1960 for a very general class of estimation problemsinvolving

time-varying

processes by R. E. Kalman. The solution, now called a

Kalman filter, is a beautiful instance of minimum-variance unbiased estimation inHilbert spaces. Kalman filters are used in many real-world systems, and remains one of mostwidely used estimation procedures.

©

Sridhar Mahadevan: CMPSCI 689 – p.4/

Kalman Filter

x

y

trueobservedfiltered

©

Sridhar Mahadevan: CMPSCI 689 – p.5/

Projection onto asubspace

Given a subspace

M

generated by

n

linearly independent vectors

a

1

,... , a

n

, the

vector closest to

b

minimizes the orthogonal “error” vector

b

Ax

2

A

T

A

x

A

T

b

x

A

T

A

1

A

T

b

In our simple robotics problem, we get

[1 1 1]

 

 

x

= [1 1 1]

 

 

x

©

Sridhar Mahadevan: CMPSCI 689 – p.7/

WeightedLeast-Squares

What if the measurements were not identical, in the sense that somemeasurements were more reliable than others? In this case, we can modify the least-squares criterion by adding a weightingcofficient

Minimize

W

Ax

b

which immediately leads to the weighted least squares estimate

W A

T

W A

x

W A

T

W b

A

T

W

T

W A

x

A

T

W

T

W b

If we define

C

W

T

W

, we can simplify this to get

x

A

T

CA

1

A

T

Cb

Lb

This is an example of a

linear estimator

that is of profound importance in practice.

©

Sridhar Mahadevan: CMPSCI 689 – p.8/

Gauss-MarkovTheorem

Gauss-Markov theorem:

Pick

C

V

1

and

x

A

T

V

1

A

1

A

T

V

1

b

Proof:

Since we are minimizing the variance, we have

E

x

x

x

x

T

E

x

Lb

x

Lb

T

E

x

LAx

L

x

LAx

L

T

E

L

L

T

LE

T

L

T

LV L

T

We need to show

L

0

A

T

V

1

A

1

A

T

V

1

is a MVUE. Note that

L

0

A

I

, so

it is unbiased. For any estimator

L

, define

L

L

L

0

L

0

P

LV L

T

L

0

V L

To

L

L

0

V L

T 0

L

0

V

L

L

0

T

L

L

0

V

L

L

0

T

L

0

V L

To

L

L

0

V

L

L

0

T

(minimized at )

L

L

0

Note that

P

L

0

V L

T 0

A

T

V

1

A

1

, so when

V

I

, this reduces to

standard least-squares.

©

Sridhar Mahadevan: CMPSCI 689 – p.10/

RecursiveLeast-Squares

Suppose we estimate

x

0

from measurements

b

0

by approximating

A

0

x

0

b

0

Given a new dataset

b

1

, how to compute a revised estimate

x

1

from

A

1

x

1

b

1

without redoing the previous computations? The information matrix or inverse of the error covariance matrix is

P

1

1

A

T

V

1

A

[

A

0

A

1

]

T

[

V

0

V

1

]

1

[

A

0

A

1

]

A

T 0

V

1

0

A

0

A

T 1

V

1

1

A

1

P

1

0

A

T 1

V

1

1

A

1

Note that since the normal equations are

A

T

V

1

Ax

A

T

V

1

b

we get

x

1

P

1

[

A

0

A

1

]

T

V

1

[

b

0

b

1

]

P

1

A

T 0

V

1

0

b

0

A

T 1

V

1

1

b

1

P

1

P

1

1

x

0

A

T 1

V

1

1

A

1

x

0

A

T 1

V

1

1

b

1

x

0

P

1

A

T 1

V

1

1

b

1

A

1

x

0

©

Sridhar Mahadevan: CMPSCI 689 – p.11/

Hilbert Space ofRandom Variables

Suppose

x

1

,... , x

n

is a set of scalar random variables. The Hilbert space defined

by them is the set of all linear combinations

y

i

α

i

x

i

, where the inner product

between

y

1

and

y

2

is defined as

< y

1

, y

2

E

y

1

y

2

E

i

α

i

x

i

j

β

j

x

j

We assume

E

y

2 i

y

i

2

, so variables are of finite length.

Let

z

1

,... , z

m

be a set of vector random variables, each of which has dimension

n

. The Hilbert space consists of linear combinations of these variables, defined as

y

K

1

z

1

... K

m

z

m

where

K

i

is a real-valued

n

×

n

matrix, and the the inner product is defined as

< y

1

, y

2

E

i

x

i

z

i

T race E

xz

T

The length or norm of a vector random variable is defined as ‖

y

T race E

xx

T

©

Sridhar Mahadevan: CMPSCI 689 – p.13/

Gauss-MarkovRevisited

Let us bring in the perspective of Hilbert spaces. The task is find the minimumvariance unbiased linear estimator

x

Lb

where the measurements are noisy and

produced by the process

b

Ax

Geometrically speaking, find the vector

x

with minimum

expected

length

E

x

x

2

E

x

Lb

2

E

x

LAx

L

2

E

< x

LAx

L, x

LAx

L >

x

LAx

2

Trace

LV L

T

Trace

LV L

T

Let

l

i

be the

i

th

row of

L

, and

a

j

be the

j

th

column of

A

. We define a set of

minimization problems: Minimize

l

Ti

V l

i

subject to

l

T i

a

j

δ

ij

If we define the inner product

< x, y >

V

x

T

V y

, we get a set of minimum norm

problems with constraints

Minimize

l

i

V

subject to

< l

i

, V

1

a

j

V

δ

ij

©

Sridhar Mahadevan: CMPSCI 689 – p.14/

Projection Theoremfor Affine Subspaces

Consider now an affine subspace

M

Y

which consists of all vectors

x

H

satisfying constraints of the form (as we will see soon, kernel methods use thisformulation):

< x, y

1

c

1

< x, y

n

c

n

where

y

1

,... , y

n

are

n

linearly independent vectors forming an

n

-dimensional

subspace

Y

of

H

. Suppose

c

i

for all

i

. Then

M

Y

Y

is the orthogonal

subspace of

Y

. So, in the general case of nonzero

c

i

, the affine subspace

M

Y

is

the orthgonal subspace

Y

translated by the vector

[

c

1

,... , c

n

]

We say that an affine subspace

M

Y

is of

co-dimension

n

if it is the orthgonal

complement (perhaps translated) of a subspace of dimension

n

Projection Theorem: Let

H

be a Hilbert space and

Y

be a finite dimensional

subspace generated by

n

linearly independent vectors

y

1

,... , y

n

. Among all

vectors

x

satisfying the above constraints, there exists a unique vector

x

o

i

β

i

y

i

with minimum norm, whose coefficients satisfy the equations

< y

1

, y

1

β

1

... < y

n

, y

1

β

n

c

1

< y

n

, y

1

β

1

... < y

n

, y

n

β

n

c

n

©

Sridhar Mahadevan: CMPSCI 689 – p.16/

Improving onGauss-Markov

In the Gauss Markov formulation, we assumed that the vector

x

to be estimated

was completely unknown, but fixed. We now show that if we instead treat

x

as a

random vector whose statistics are known, we can get a minimum varianceestimator with lower variance than that produced by the Gauss-Markov estimator. Theorem:

Let

b

and

x

be random vectors, where

b

Ax

. The minimum

variance linear estimate

x

of

x

is given by

x

E

xb

T

)[

E

bb

T

)]

1

b

and covariance matrix of the error is given by

E

x

x

x

x

T

E

xx

T

E

x

x

T

E

xx

T

E

xb

T

E

bb

T

1

E

bx

T

©

Sridhar Mahadevan: CMPSCI 689 – p.17/

Minimum VarianceEstimate

How do we know that we have indeed found a lower variance estimator? Corollary:

Let

b

Ax

, where

b

is an observed

m

-dimensional data vector,

x

is

an unknown

n

-dimensional random vector with known covariance

R

E

xx

T

and

is an unknown

m

-dimensional random vector with known covariance

V

E

T

. Further, assume that

E

x

T

, so the errors are uncorrelated

with the data. The minimum variance linear estimate

x

minimizing

E

x

x

2

is given by

x

RA

T

ARA

T

V

1

b

E

x

x

x

x

T

R

RA

T

ARA

T

V

1

AR

Proof:

Note that the covariance of

b

is now

E

bb

T

E

Ax

Ax

T

ARA

T

V

and that

E

xb

T

E

x

Ax

T

RA

T

. Applying the above minimum

variance theorem gives the form of the linear estimator above.

©

Sridhar Mahadevan: CMPSCI 689 – p.19/

Minimum VarianceEstimator

To check that we have a lower variance estimator, we rewrite the above corollary inan alternate format. Note the matrix identity

RA

T

ARA

T

V

1

A

T

V

1

A

R

1

1

A

T

V

1

which can be shown by postmultiplying both sides by

ARA

T

V

and

premultiplying by

A

T

V

1

A

R

1

. This implies that the minimum variance

estimator can be written as

x

RA

T

ARA

T

V

1

b

A

T

V

1

A

R

1

1

A

T

V

1

b

The error covariance can be rewritten similarly as

E

x

x

x

x

T

R

RA

T

ARA

T

V

1

AR

R

A

T

V

1

A

R

1

1

A

T

V

1

AR

A

T

V

1

A

R

1

1

after some simplification

The Gauss variance estimate is the limiting case of the minimum varianceestimate as

R

1

©

Sridhar Mahadevan: CMPSCI 689 – p.20/