



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Solutions to three engineering optimization problems related to hypothesis testing and hyperplane separation. Topics such as likelihood ratio tests, receiver operating characteristics (roc), and support vector machines (svm). It also includes matlab solutions for each problem.
Typology: Study Guides, Projects, Research
1 / 5
This page cannot be seen from the preview
Don't miss anything!




Saad Bin Qaisar
ECE Department, Michigan State University,
East Lansing, MI- 48824, United States
E-mail: qaisarsa @egr.msu.edu
Abstract –Solved here are three problems for Engineering
Optimization Class.
Index Terms – Engineering Optimization, SVM
I. P ROBLEM 1
To reformulate the detection problem (for M hypothesis)
as an optimization problem, and to show that this re-interpretation
also results in the same solution.
A. Hypothesis Testing
H (^) i : y = xi + ni , i = 1 , 2 ,..., M
pi = NoiseDistribution
p (^) i ( y )= p ( y | Hi ), i = 1 , 2
Likelihood Ratio ( )
1
2
p y
p y l =
Threshold τ
Hypothesis Selection
⎩
τ
τ
Hif l
H ifl
1
2
Suppose Y is a random variable with values in , with
a distribution that depends on a parameter
{ 1 ,..., n }
represented by a matrix
nxm P ∈ R , with elements:
The j th column of P gives the probability distribution
, with a distribution dependent on the observed
value of Y. A randomized detector matrix
mxn T ∈ R with
elements
For a randomized detector defined by the matrix T, we define the
detection probability matrix as Dmxm = TP. We have:
The diagonal entries of D, arranged in a vector, give the detection
probabilities, and denoted as
d P :
P Dii prob ( ˆ i | i )
d
The error probabilities are the complements, and are denoted as e P :
P 1 D D prob (ˆ i | i )
ji
ii ji
e
≠
Thus, the optimal detector design for the multi hypothesis problem
can be formulated by introducing a weighting matrix (for
scalarization) W for D, where, and satisfies:
mx m W ∈ R
Wii = 0 , ,
,
i =1,…, m
Wij > 0 i ,j= 1,…,m i ≠ j
W is a weighting matrix, with weight associated with the
error of guessing , when in fact
W ij
multihypothesis detection problem can be rephrased as
Minimize tr ( W D )
T
Subject to t (^) k ≥ 0 , (^) t k n (1) k
T 1 = 1 , = 1 ,...,
For Binary Hypothesis Testing
Since m=2 in our problem, it is a case of binary
hypothesis testing. Let we are interested in x 2 to occur. Thus, if
y = x 2 + n 2 , we say that our event of interest did occur (positive
test). If y = x 1 + n 1 , we say that event did not occur (negative
test). The detection probability matrix
2 x 2 D ∈ R is traditionally
expressed as
fp fn
fp fn
Where is the probability of false negative (i.e. the test is
negative when event has occurred) and is the probability of
false positive (i.e. the test is positive when event did not occur).
P fn
P fp
We assume random variable Y to be generated from one of
two distributions, and. The optimal trade-off
curve between and is called the receiver operating
characteristics (ROC), determined by distributions p and q. When
, event did not occur, i.e.. When , the event did
occur, i.e..
n p ∈ R
n q ∈ R
P fn Pfp
l ≤ t ˆ y = x 1 l ≥ t
y ˆ = x 2
Thus, from our scalarized multi hypothesis testing formulated
in (1), we have:
k k
k k
x W p W q
x W p W q y
1 21 12
2 21 12 ˆ
This implies a likelihood ratio test, i.e. ratio
k
k
q
p is more than the
threshold
21
12
W
, the test is negative (i.e. ), otherwise, test is
positive. Hence, the reinterpretation also results in the same
solution. Similar arguments can be made Bayesian detection,
minimax detection, and Neyman Pearson detection.
1 y ˆ^ = x
II. P ROBLEM II
Maximizing the separation between the hyperplane and the
closest points is equivalent to minimizing w
A. Approach I (Covers 2a,2b,2c,2d)
We have a problem of optimally separating the set of training
vectors belonging to two separate classes,
{ (^) ( , ),...,( , ) }, , { 1 , 1 }
1 1 D = x y x y x ∈ R y ∈ −
l l n
with a hyperplane
< w , x >+ b = 0 (1)
We intend to maximize the distance between the closest vector
and the hyperplane. Without loss of generality, it is appropriate to
consider a canonical hyperplane, where the parameters w,b are
constrained by:
min |< w , x >+ b |= 1 (2)
i
Thus, it implies that the norm of the weight vector should be equal
to the inverse of the distance, of the nearest point in the data set to
the hyperplane. A separating hyperplane in canonical form must
satisfy following constraints:
y [ wx b ] i l
i i < , >+ ≥ 1 , = 1 ,..., (3)
The distance d ( w , b ; x )of a point x from the hyperplane (w,b) is:
|| ||
| , | ( , ; )
w
w x b d w b x
i < > + = (^) (4)
The optimal hyperplane is given by maximizing the margin ρ ,
subject to constraints of equation (3).
min| , | min| , | || ||
min || ||
min
(,) min ( ,; ) min(,; )
: 1 : 1
: 1 : 1
: 1 : 1
w
wx b wx b w
w
wx b
w
wx b
wb dwbx dwbx
i
xy
i
xy
i
xy
i
xy
i
xy
i
xy
i i i i
i i i i
i i i i
=− =
=− =
=− =
ρ
(5)
Hence the hyperplane that optimally separates the data is the one
that minimizes:
2 || || 2
Φ( w ) = w (6)
The solution to the optimization problem of (6) under the
constraints of (3) is given using the Lagrange function. Thus,
l
i
i i w b w i y wx b
1
2 || || ( [ , ] 1 ) 2
minimized with respect to w, b, and maximized with respect to
(7), to its dual problem, which is both conceptually and
computationally easier to solve. Thus, the dual problem is:
max ( )= max minΦ( , , ) ,
α α
W wb wb
(8)
The minimum with respect to w and b of the Lagrangian, Φ , is
given by,
Since (const.), we have separation between
and as
| a 1 |=| a 2 |= a
w
a .From these considerations, it follows that
identification of optimum separation hyperplane is performed by
maximizing
|| w || 2
a which is equivalent to minimizing a
|| w || 2 ,
or, minimizing || w || 2.
Note: We can reach to similar conclusion by following the
approach in Section 8.6.1 Robust Linear Discrimination and
Problem 8.23 of [1], with slight modifications of affine functions,
and solving the dual problem.
2 (b):
By change of variables w w a b b / a
= = , the equivalent
convex optimization problem can be written as:
Minimize || 2
|| w
Subject to 1 , 1
i +^ ≥ i =+
T w x b y
i +^ ≤− i =−
T w x b y
It can also be stated as:
Minimize || 2
|| w
Subject to ) 1
y ( w xi + b ≥
T i
It’s a Quadratic Programming problem with affine constraints.
III. P ROBLEM III
We are given the set of measurements y that are output of
some function f(x).
y = f ( x )+ n
Where n corresponds to noise generated according to some
probability distribution p. We wish to fit a function to the data
given M input-output pairs. We can formulate
multiple minimization criterions some of which include:
f
{ }
M i i i x y 1
=
Sum Squared Error:
M
i i^ i
f x f x 1
2 ( ))
Mean Squared Error:
M
i i^ i
f x f x M
1
2 ( ( ) ˆ( ))
(2)
Root Mean Squared Error:
M
i i^ i
f x f x M
1
2 ( ))
(3)
The problem is not convex in general. The squared difference
should be convex over the parameters to be
approximated, a, b, c in case of problem 3(b).
2 ( f ( xi )−ˆ f ( xi ))
Problem 3(b)
1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3
0
1
2
3
4
5
Polynomial Coefficients a,b,c
Figure 2: Polynomial Coefficients a,b,c
0 0.5 1 1.5 2 2.5 3 3.
5
6
x
f(x)
Least Squares Polynomial Curve Fit
Figure 3: Least Squares Polynomial Fit
Sum Squared Error (SSE) = 0.
Mean Squared Error (MSE) = 0.
Root Mean Squared Error= 0.
See Annex A and attached files for code
REFERENCES
[1] Stephen Boyd, Convex Optimization, Cambridge University
Press, 2006.
[2] Support Vector Machines, Optimum Separation Hyperplane,
http://www.support-vector-machines.org/SVM_osh.html
[3] Steve R. Gunn, Support Vector Machines for Classification
and Regression, TR-
[4] University of Southampton, MATLAB Support Vector
Machine Toolbox,
http://www.isis.ecs.soton.ac.uk/resources/svminfo/
ANNEX A
A. Code for Problem 3
%This function calculates the Polynomial Least Squares fitting for
the
%data, based upon optimization algorithm:
%http://mathworld.wolfram.com/LeastSquaresFittingPolynomial.h
tml
clear all
load problem3_v
x=problem3_Data(:,1);
y=problem3_Data(:,2);
n=2;
X(:,n+1) = ones(length(x),1);
%Generating a VanderMonde Matrix
for j = n:-1:
X(:,j) = x.*X(:,j+1);
end
X=fliplr(X);
%Getting the Coefficients
a=inv(X'(X))X'*y;
len_x=length(x);
%Obtaining an Approximation
for(i=1:len_x)
y_new(i)=a(1)+(a(2)x(i))+(a(3)x(i)^2);
end
%Plotting the Results
plot(x,y_new);
hold on
plot(x,y,'r')
xlabel('x')
ylabel('f(x)')
title('Least Squares Polynomial Curve Fit');
grid on
hold off
figure;
stem(fliplr(a));
title('Polynomial Coefficients a,b,c')
grid on
y_new=y_new(:);
e=sum((y_new-y).^2); %Error Function
B. Code for Problem 2
Please see the attached zipped files.