




















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The topic of linear estimation in hilbert spaces, focusing on minimum variance unbiased estimation and the application of the kalman filter. Motivations, outlines, and examples of least-squares estimation, projection onto a subspace, weighted least-squares, and the gauss-markov theorem. The text also discusses the recursive least-squares algorithm and its connection to hilbert spaces.
Typology: Study notes
1 / 28
This page cannot be seen from the preview
Don't miss anything!





















Sridhar Mahadevan
University of Massachusetts
©
Sridhar Mahadevan: CMPSCI 689 – p.1/
We have covered three approaches to parameter estimation
Maximum likelihood Bayesian estimation Least-squares
In complex problems, maximum likelihood and Bayesian estimation are difficult touse since they require knowledge of the full joint distribution. Often, a simplifying assumption that is made is that the variables are Gaussian,where the first order and second order moments fully define the distribution. This assumption coincides with the least squares approach, which forms therichest set of practical applications of statistical estimation. To deal with time-series problems, least squares estimation needs to be modifiedto work incrementally.
©
Sridhar Mahadevan: CMPSCI 689 – p.2/
Many real-world applications involve the following problem:
A series of incremental observations are made using some noisymeasurement system. The goal is to continually estimate some underlying parameters, or hiddenstate. The aim is to solve this problem recursively, so that the estimates can beupdated extremely rapidly online. Thousands of real-world applications: tracking a spacecraft, monitoring theUS economy, robot navigation, automatic landing of a 747,
This problem was solved in 1960 for a very general class of estimation problemsinvolving
time-varying
processes by R. E. Kalman. The solution, now called a
Kalman filter, is a beautiful instance of minimum-variance unbiased estimation inHilbert spaces. Kalman filters are used in many real-world systems, and remains one of mostwidely used estimation procedures.
©
Sridhar Mahadevan: CMPSCI 689 – p.4/
©
Sridhar Mahadevan: CMPSCI 689 – p.5/
Given a subspace
generated by
n
linearly independent vectors
a
1
,... , a
n
, the
vector closest to
b
minimizes the orthogonal “error” vector
b
Ax
2
T
x
T
b
x
T
−
1
T
b
In our simple robotics problem, we get
x
x
©
Sridhar Mahadevan: CMPSCI 689 – p.7/
What if the measurements were not identical, in the sense that somemeasurements were more reliable than others? In this case, we can modify the least-squares criterion by adding a weightingcofficient
Minimize
Ax
b
which immediately leads to the weighted least squares estimate
T
x
T
W b
T
T
x
T
T
W b
If we define
T
, we can simplify this to get
x
T
−
1
T
Cb
Lb
This is an example of a
linear estimator
that is of profound importance in practice.
©
Sridhar Mahadevan: CMPSCI 689 – p.8/
Gauss-Markov theorem:
Pick
−
1
and
x
T
−
1
−
1
T
−
1
b
Proof:
Since we are minimizing the variance, we have
x
x
x
x
T
x
Lb
x
Lb
T
x
LAx
x
LAx
T
T
T
T
T
We need to show
0
T
−
1
−
1
T
−
1
is a MVUE. Note that
0
, so
it is unbiased. For any estimator
, define
0
0
T
0
To
0
T 0
0
0
T
0
0
T
0
To
0
0
T
(minimized at )
0
Note that
0
T 0
T
−
1
−
1
, so when
, this reduces to
standard least-squares.
©
Sridhar Mahadevan: CMPSCI 689 – p.10/
Suppose we estimate
x
0
from measurements
b
0
by approximating
0
x
0
b
0
Given a new dataset
b
1
, how to compute a revised estimate
x
1
from
1
x
1
b
1
without redoing the previous computations? The information matrix or inverse of the error covariance matrix is
−
1
1
T
−
1
[
0
1
]
T
[
0
1
]
−
1
[
0
1
]
T 0
−
1
0
0
T 1
−
1
1
1
−
1
0
T 1
−
1
1
1
Note that since the normal equations are
T
−
1
Ax
T
−
1
b
we get
x
1
1
[
0
1
]
T
−
1
[
b
0
b
1
]
1
T 0
−
1
0
b
0
T 1
−
1
1
b
1
1
−
1
1
x
0
T 1
−
1
1
1
x
0
T 1
−
1
1
b
1
x
0
1
T 1
−
1
1
b
1
1
x
0
©
Sridhar Mahadevan: CMPSCI 689 – p.11/
Suppose
x
1
,... , x
n
is a set of scalar random variables. The Hilbert space defined
by them is the set of all linear combinations
y
∑
i
α
i
x
i
, where the inner product
between
y
1
and
y
2
is defined as
< y
1
, y
2
y
1
y
2
∑
i
α
i
x
i
∑
j
β
j
x
j
We assume
y
2 i
y
i
2
, so variables are of finite length.
Let
z
1
,... , z
m
be a set of vector random variables, each of which has dimension
n
. The Hilbert space consists of linear combinations of these variables, defined as
y
1
z
1
m
z
m
where
i
is a real-valued
n
n
matrix, and the the inner product is defined as
< y
1
, y
2
∑
i
x
i
z
i
T race E
xz
T
The length or norm of a vector random variable is defined as ‖
y
√
T race E
xx
T
©
Sridhar Mahadevan: CMPSCI 689 – p.13/
Let us bring in the perspective of Hilbert spaces. The task is find the minimumvariance unbiased linear estimator
x
Lb
where the measurements are noisy and
produced by the process
b
Ax
Geometrically speaking, find the vector
x
with minimum
expected
length
x
x
2
x
Lb
2
x
LAx
2
< x
LAx
L, x
LAx
x
LAx
2
Trace
T
Trace
T
Let
l
i
be the
i
th
row of
, and
a
j
be the
j
th
column of
. We define a set of
minimization problems: Minimize
l
Ti
V l
i
subject to
l
T i
a
j
δ
ij
If we define the inner product
< x, y >
V
x
T
V y
, we get a set of minimum norm
problems with constraints
Minimize
l
i
V
subject to
< l
i
−
1
a
j
V
δ
ij
©
Sridhar Mahadevan: CMPSCI 689 – p.14/
Consider now an affine subspace
Y
which consists of all vectors
x
satisfying constraints of the form (as we will see soon, kernel methods use thisformulation):
< x, y
1
c
1
< x, y
n
c
n
where
y
1
,... , y
n
are
n
linearly independent vectors forming an
n
-dimensional
subspace
of
. Suppose
c
i
for all
i
. Then
Y
⊥
is the orthogonal
subspace of
. So, in the general case of nonzero
c
i
, the affine subspace
Y
is
the orthgonal subspace
⊥
translated by the vector
c
1
,... , c
n
We say that an affine subspace
Y
is of
co-dimension
n
if it is the orthgonal
complement (perhaps translated) of a subspace of dimension
n
Projection Theorem: Let
be a Hilbert space and
be a finite dimensional
subspace generated by
n
linearly independent vectors
y
1
,... , y
n
. Among all
vectors
x
satisfying the above constraints, there exists a unique vector
x
o
∑
i
β
i
y
i
with minimum norm, whose coefficients satisfy the equations
< y
1
, y
1
β
1
... < y
n
, y
1
β
n
c
1
< y
n
, y
1
β
1
... < y
n
, y
n
β
n
c
n
©
Sridhar Mahadevan: CMPSCI 689 – p.16/
In the Gauss Markov formulation, we assumed that the vector
x
to be estimated
was completely unknown, but fixed. We now show that if we instead treat
x
as a
random vector whose statistics are known, we can get a minimum varianceestimator with lower variance than that produced by the Gauss-Markov estimator. Theorem:
Let
b
and
x
be random vectors, where
b
Ax
. The minimum
variance linear estimate
x
of
x
is given by
x
xb
T
bb
T
−
1
b
and covariance matrix of the error is given by
x
x
x
x
T
xx
T
x
x
T
xx
T
xb
T
bb
T
−
1
bx
T
©
Sridhar Mahadevan: CMPSCI 689 – p.17/
How do we know that we have indeed found a lower variance estimator? Corollary:
Let
b
Ax
, where
b
is an observed
m
-dimensional data vector,
x
is
an unknown
n
-dimensional random vector with known covariance
xx
T
and
is an unknown
m
-dimensional random vector with known covariance
T
. Further, assume that
x
T
, so the errors are uncorrelated
with the data. The minimum variance linear estimate
x
minimizing
x
x
2
is given by
x
T
T
−
1
b
x
x
x
x
T
T
T
−
1
Proof:
Note that the covariance of
b
is now
bb
T
Ax
Ax
T
T
and that
xb
T
x
Ax
T
T
. Applying the above minimum
variance theorem gives the form of the linear estimator above.
©
Sridhar Mahadevan: CMPSCI 689 – p.19/
To check that we have a lower variance estimator, we rewrite the above corollary inan alternate format. Note the matrix identity
T
T
−
1
T
−
1
−
1
−
1
T
−
1
which can be shown by postmultiplying both sides by
T
and
premultiplying by
T
−
1
−
1
. This implies that the minimum variance
estimator can be written as
x
T
T
−
1
b
T
−
1
−
1
−
1
T
−
1
b
The error covariance can be rewritten similarly as
x
x
x
x
T
T
T
−
1
T
−
1
−
1
−
1
T
−
1
T
−
1
−
1
−
1
after some simplification
The Gauss variance estimate is the limiting case of the minimum varianceestimate as
−
1
©
Sridhar Mahadevan: CMPSCI 689 – p.20/