




























Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Notes; Class: Advanced Topics in Computer Graphics; Subject: Computer Science; University: University of California-Santa Cruz; Term: Unknown 2009;
Typology: Study notes
1 / 36
This page cannot be seen from the preview
Don't miss anything!





























Manfred K. Warmuth
University of California - Santa Cruz
UC Berkeley, March 18, 2009
(^1) Kernel Paradigm
2 An iff condition for the kernel paradigm
(^3) The matrix case
(^4) Efficiency
Kernel Paradigm
Examples 〈S〉 = 〈(x 1 , 1 ),... , (xt ,t )〉 Linear hypothesis w (〈S〉) = w Predicts with w · x on instance x
Kernel Paradigm
Simply invent new features :-)
Kernel Paradigm
Can you always improve things by inventing new features Fitting the data may be Is this learning?
Short answer If good fit with non-sparse solution - maybe If good fit with sparse solution - no
Kernel Paradigm
If w linear combination of expanded instances, then
̂ ` =
i
ai φ(zi ) ︸ ︷︷ ︸ w
· φ(z) =
i
ai φ(zi ) · φ(z) ︸ ︷︷ ︸ K (zi ,z)
Kernel Paradigm
source
sink
1 z 1 ˜z 1
1 z 2 ˜z 2
1 z 3 ˜z 3
One term per path
Kernel Paradigm
Many of our favorite algorithms can be “kernelized”: Linear Least Squares, Widrow-Hoff, Support Vector Machines, PCA, “Simplex Algorithm”, ...
Question:
What is the class of algorithms that can be kernelized?
Kernel Paradigm
Representer Theorem: [KW71]
w = arg inf (^) w˜
|| w˜||^2 + η
i
lossi (w · xi )
Solution w linear combination of the φ(xi )
Sufficient condition for the fact that parameter vector w is linear combination of instances
An iff condition for the kernel paradigm
(^1) Kernel Paradigm
2 An iff condition for the kernel paradigm
(^3) The matrix case
(^4) Efficiency
An iff condition for the kernel paradigm
If U is an orthogonal matrix, then 〈US〉 denotes the sequence
〈(Ux 1 , y 1 ), (Ux 2 , y 2 ),... , (Uxt , yt )〉
Transformation only affects the instances
X′X = (UX)′^ UX
Lemma
Two sequences of examples are orthogonal transformations of each other iff the kernel matrices associated with the sequences are the same
An iff condition for the kernel paradigm
Alg. produces weight vector
w : example sequences to → Rn
Actions of alg. depend on dot product
w (〈S〉) · x
w is transformation invariant if
w (〈S〉) · x = w (〈US〉) · Ux,
for all sequences 〈S〉, orthogonal matrices U, and instances x
An iff condition for the kernel paradigm
Consider alg. w (〈S〉) =
0 if x 1 , 1 > 0 x 1 otherwise.
Alg. satisfies 2a predicts w. linear combination of instances
but not 2b transformed sequence → transformed matrix 1 transformation invariance 3 coefficients only depend on kernel matrix Use U = −I
An iff condition for the kernel paradigm
The parameter vector is a linear combination of instances Too general?
“Algorithm only relies on dot products” (Individual features never touched) More general than characterization of above theorem