Optimizing Engineering: Minimizing Error in Hypothesis Detection and Separation - Prof. To, Study Guides, Projects, Research of Electrical and Electronics Engineering

Solutions to three engineering optimization problems related to hypothesis testing and hyperplane separation. Topics such as likelihood ratio tests, receiver operating characteristics (roc), and support vector machines (svm). It also includes matlab solutions for each problem.

Typology: Study Guides, Projects, Research

Pre 2010

Uploaded on 07/23/2009

koofers-user-w2e
koofers-user-w2e 🇺🇸

9 documents

1 / 5

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Engineering Optimization as a Problem Solving Tool:
Exam I Engineering Optimization ECE 802-604
Saad Bin Qaisar
ECE Department, Michigan State University,
East Lansing, MI- 48824, United States
E-mail: qaisarsa @egr.msu.edu
Abstract –Solved here are three problems for Engineering
Optimization Class.
Index Terms – Engineering Optimization, SVM
I. PROBLEM 1
To reformulate the detection problem (for M hypothesis)
as an optimization problem, and to show that this re-interpretation
also results in the same solution.
A. Hypothesis Testing
MinxyH iii ,...,2,1,: =+=
Noisepi=onDistributi
2=
M
2,1),|()( == iHypyp ii
Likelihood Ratio )(
)(
1
2yp
yp
l=
Threshold
τ
Hypothesis Selection
τ
τ
lifH
lifH
1
2
Suppose Y is a random variable with values in , with
a distribution that depends on a parameter
},...,1{ n
},...,1{ m
θ
. The
distributions of Y, for the m possible values of
θ
, can be
represented by a matrix mxn
R
P
, with elements:
)|( jkYprobpkj ===
θ
The jth column of
P
gives the probability distribution
associated with the parameter value j=
θ
. We consider the
problem of estimating
θ
, based on an observed sample of Y. The
m values of
θ
are called the hypotheses, and finding the best value
of
θ
is hypothesis testing.
A randomized detector of
θ
is a random variable
, with a distribution dependent on the observed
value of Y. A randomized detector matrix
},...,1{
ˆm
θ
nxm
R
T
with
elements
)|
ˆ
(kXiprobtik ===
θ
For a randomized detector defined by the matrix T, we define the
detection probability matrix as . We have:
TPDmxm =
)|
ˆ
()( jiprobTPD ijij ====
θθ
The diagonal entries of D, arranged in a vector, give the detection
probabilities, and denoted as d
P
:
)|
ˆ
(iiprobDP ii
d
i====
θθ
The error probabilities are the complements, and are denoted as
e
P
:
)|
ˆ
(1 iiprobDDP
ij jiii
e
i====
θθ
Thus, the optimal detector design for the multi hypothesis problem
can be formulated by introducing a weighting matrix (for
scalarization) W for D, where, and satisfies:
mxm
RW
0
ii
W, ,
,
m,1,i =
0>
ij
Wm,1, j , i
j
i
Wis a weighting matrix, with weight associated with the
error of guessing , when in fact
ij
W
i=
θ
ˆj=
θ
. Thus the
multihypothesis detection problem can be rephrased as
Minimize )( DWtr T
Subject to
0, (1)
k
tnktk
T,...,1,11 ==
For Binary Hypothesis Testing
Since m=2 in our problem, it is a case of binary
hypothesis testing. Let we are interested in to occur. Thus, if
2
x
22 nxy
+
=
, we say that our event of interest did occur (positive
test). If 11 nxy
, we say that event did not occur (negative
test). The detection probability matrix 22 x
R
Dis traditionally
expressed as
=
fnfp
fnfp
PP
PP
D1
1
pf3
pf4
pf5

Partial preview of the text

Download Optimizing Engineering: Minimizing Error in Hypothesis Detection and Separation - Prof. To and more Study Guides, Projects, Research Electrical and Electronics Engineering in PDF only on Docsity!

Engineering Optimization as a Problem Solving Tool:

Exam I Engineering Optimization ECE 802-

Saad Bin Qaisar

ECE Department, Michigan State University,

East Lansing, MI- 48824, United States

E-mail: qaisarsa @egr.msu.edu

Abstract –Solved here are three problems for Engineering

Optimization Class.

Index Terms – Engineering Optimization, SVM

I. P ROBLEM 1

To reformulate the detection problem (for M hypothesis)

as an optimization problem, and to show that this re-interpretation

also results in the same solution.

A. Hypothesis Testing

H (^) i : y = xi + ni , i = 1 , 2 ,..., M

pi = NoiseDistribution

M = 2

p (^) i ( y )= p ( y | Hi ), i = 1 , 2

Likelihood Ratio ( )

1

2

p y

p y l =

Threshold τ

Hypothesis Selection

τ

τ

Hif l

H ifl

1

2

Suppose Y is a random variable with values in , with

a distribution that depends on a parameter

{ 1 ,..., n }

θ ∈{ 1 ,..., m }. The

distributions of Y, for the m possible values of θ , can be

represented by a matrix

nxm PR , with elements:

p kj = prob ( Y = k | θ= j )

The j th column of P gives the probability distribution

associated with the parameter value θ = j. We consider the

problem of estimating θ , based on an observed sample of Y. The

m values of θ are called the hypotheses, and finding the best value

of θ is hypothesis testing.

A randomized detector of θ is a random variable

, with a distribution dependent on the observed

value of Y. A randomized detector matrix

θˆ ∈{ 1 ,..., m }

mxn TR with

elements

t ik = prob (θ ˆ= i | X = k )

For a randomized detector defined by the matrix T, we define the

detection probability matrix as Dmxm = TP. We have:

D ij =( TP ) ij = prob (θˆ= i |θ= j )

The diagonal entries of D, arranged in a vector, give the detection

probabilities, and denoted as

d P :

P Dii prob ( ˆ i | i )

d

i = = θ= θ=

The error probabilities are the complements, and are denoted as e P :

P 1 D D probi | i )

ji

ii ji

e

i =^ − =∑ = ≠ =

Thus, the optimal detector design for the multi hypothesis problem

can be formulated by introducing a weighting matrix (for

scalarization) W for D, where, and satisfies:

mx m WR

Wii = 0 , ,

,

i =1,…, m

Wij > 0 i ,j= 1,…,m ij

W is a weighting matrix, with weight associated with the

error of guessing , when in fact

W ij

θˆ^ = i θ = j. Thus the

multihypothesis detection problem can be rephrased as

Minimize tr ( W D )

T

Subject to t (^) k ≥ 0 , (^) t k n (1) k

T 1 = 1 , = 1 ,...,

For Binary Hypothesis Testing

Since m=2 in our problem, it is a case of binary

hypothesis testing. Let we are interested in x 2 to occur. Thus, if

y = x 2 + n 2 , we say that our event of interest did occur (positive

test). If y = x 1 + n 1 , we say that event did not occur (negative

test). The detection probability matrix

2 x 2 DR is traditionally

expressed as

fp fn

fp fn

P P

P P

D

Where is the probability of false negative (i.e. the test is

negative when event has occurred) and is the probability of

false positive (i.e. the test is positive when event did not occur).

P fn

P fp

We assume random variable Y to be generated from one of

two distributions, and. The optimal trade-off

curve between and is called the receiver operating

characteristics (ROC), determined by distributions p and q. When

, event did not occur, i.e.. When , the event did

occur, i.e..

n pR

n qR

P fn Pfp

lt ˆ y = x 1 lt

y ˆ = x 2

Thus, from our scalarized multi hypothesis testing formulated

in (1), we have:

k k

k k

x W p W q

x W p W q y

1 21 12

2 21 12 ˆ

This implies a likelihood ratio test, i.e. ratio

k

k

q

p is more than the

threshold

21

12

W

W

, the test is negative (i.e. ), otherwise, test is

positive. Hence, the reinterpretation also results in the same

solution. Similar arguments can be made Bayesian detection,

minimax detection, and Neyman Pearson detection.

1 y ˆ^ = x

II. P ROBLEM II

Maximizing the separation between the hyperplane and the

closest points is equivalent to minimizing w

A. Approach I (Covers 2a,2b,2c,2d)

We have a problem of optimally separating the set of training

vectors belonging to two separate classes,

{ (^) ( , ),...,( , ) }, , { 1 , 1 }

1 1 D = x y x y xR y ∈ −

l l n

with a hyperplane

< w , x >+ b = 0 (1)

We intend to maximize the distance between the closest vector

and the hyperplane. Without loss of generality, it is appropriate to

consider a canonical hyperplane, where the parameters w,b are

constrained by:

min |< w , x >+ b |= 1 (2)

i

Thus, it implies that the norm of the weight vector should be equal

to the inverse of the distance, of the nearest point in the data set to

the hyperplane. A separating hyperplane in canonical form must

satisfy following constraints:

y [ wx b ] i l

i i < , >+ ≥ 1 , = 1 ,..., (3)

The distance d ( w , b ; x )of a point x from the hyperplane (w,b) is:

|| ||

| , | ( , ; )

w

w x b d w b x

i < > + = (^) (4)

The optimal hyperplane is given by maximizing the margin ρ ,

subject to constraints of equation (3).

min| , | min| , | || ||

min || ||

min

(,) min ( ,; ) min(,; )

: 1 : 1

: 1 : 1

: 1 : 1

w

wx b wx b w

w

wx b

w

wx b

wb dwbx dwbx

i

xy

i

xy

i

xy

i

xy

i

xy

i

xy

i i i i

i i i i

i i i i

=− =

=− =

=− =

ρ

(5)

Hence the hyperplane that optimally separates the data is the one

that minimizes:

2 || || 2

Φ( w ) = w (6)

The solution to the optimization problem of (6) under the

constraints of (3) is given using the Lagrange function. Thus,

l

i

i i w b w i y wx b

1

2 || || ( [ , ] 1 ) 2

H 1 H 2

Where α are the Lagrange multipliers. The Lagrangian has to be

minimized with respect to w, b, and maximized with respect to

α ≥ 0. Classical Lagrangian duality enables the primal problem,

(7), to its dual problem, which is both conceptually and

computationally easier to solve. Thus, the dual problem is:

max ( )= max minΦ( , , ) ,

α α

W wb wb

(8)

The minimum with respect to w and b of the Lagrangian, Φ , is

given by,

Since (const.), we have separation between

and as

| a 1 |=| a 2 |= a

H 1 H 2

w

a .From these considerations, it follows that

identification of optimum separation hyperplane is performed by

maximizing

|| w || 2

a which is equivalent to minimizing a

|| w || 2 ,

or, minimizing || w || 2.

Note: We can reach to similar conclusion by following the

approach in Section 8.6.1 Robust Linear Discrimination and

Problem 8.23 of [1], with slight modifications of affine functions,

and solving the dual problem.

2 (b):

By change of variables w w a b b / a

= = , the equivalent

convex optimization problem can be written as:

Minimize || 2

|| w

Subject to 1 , 1

i +^ ≥ i =+

T w x b y

i +^ ≤− i =−

T w x b y

It can also be stated as:

Minimize || 2

|| w

Subject to ) 1

~^ ~

y ( w xi + b

T i

It’s a Quadratic Programming problem with affine constraints.

III. P ROBLEM III

We are given the set of measurements y that are output of

some function f(x).

y = f ( x )+ n

Where n corresponds to noise generated according to some

probability distribution p. We wish to fit a function to the data

given M input-output pairs. We can formulate

multiple minimization criterions some of which include:

f

{ }

M i i i x y 1

=

Sum Squared Error:

Minimize (^) ∑ (1)

M

i i^ i

f x f x 1

2 ( ))

Mean Squared Error:

Minimize (^) ∑

M

i i^ i

f x f x M

1

2 ( ( ) ˆ( ))

(2)

Root Mean Squared Error:

Minimize (^) ∑

M

i i^ i

f x f x M

1

2 ( ))

(3)

The problem is not convex in general. The squared difference

should be convex over the parameters to be

approximated, a, b, c in case of problem 3(b).

2 ( f ( xi )−ˆ f ( xi ))

Problem 3(b)

1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3

0

1

2

3

4

5

Polynomial Coefficients a,b,c

Figure 2: Polynomial Coefficients a,b,c

0 0.5 1 1.5 2 2.5 3 3.

5

6

x

f(x)

Least Squares Polynomial Curve Fit

Figure 3: Least Squares Polynomial Fit

Sum Squared Error (SSE) = 0.

Mean Squared Error (MSE) = 0.

Root Mean Squared Error= 0.

See Annex A and attached files for code

REFERENCES

[1] Stephen Boyd, Convex Optimization, Cambridge University

Press, 2006.

[2] Support Vector Machines, Optimum Separation Hyperplane,

http://www.support-vector-machines.org/SVM_osh.html

[3] Steve R. Gunn, Support Vector Machines for Classification

and Regression, TR-

[4] University of Southampton, MATLAB Support Vector

Machine Toolbox,

http://www.isis.ecs.soton.ac.uk/resources/svminfo/

ANNEX A

A. Code for Problem 3

%This function calculates the Polynomial Least Squares fitting for

the

%data, based upon optimization algorithm:

%http://mathworld.wolfram.com/LeastSquaresFittingPolynomial.h

tml

clear all

load problem3_v

x=problem3_Data(:,1);

y=problem3_Data(:,2);

n=2;

X(:,n+1) = ones(length(x),1);

%Generating a VanderMonde Matrix

for j = n:-1:

X(:,j) = x.*X(:,j+1);

end

X=fliplr(X);

%Getting the Coefficients

a=inv(X'(X))X'*y;

len_x=length(x);

%Obtaining an Approximation

for(i=1:len_x)

y_new(i)=a(1)+(a(2)x(i))+(a(3)x(i)^2);

end

%Plotting the Results

plot(x,y_new);

hold on

plot(x,y,'r')

xlabel('x')

ylabel('f(x)')

title('Least Squares Polynomial Curve Fit');

grid on

hold off

figure;

stem(fliplr(a));

title('Polynomial Coefficients a,b,c')

grid on

y_new=y_new(:);

e=sum((y_new-y).^2); %Error Function

B. Code for Problem 2

Please see the attached zipped files.