












Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
These lecture slides are very easy to understand and very helpful to built a concept about the Matrix computation.The key points discuss in these slides are:Sparse Coding, Overcomplete Dictionary, Matching Pursuit, Basis Pursuit, Sparse Representation of Signals, Compressive Sensing, Orthogonal Matching Pursuit, Design of Dictionaries, Maximum Likelihood Methods
Typology: Slides
1 / 20
This page cannot be seen from the preview
Don't miss anything!













Sparse coding Overcomplete dictionary Matching pursuit Basis pursuit K-SVD Applications
Using an overcomplete dictionary matrix D ∈ IRn×K^ that contains K prototype signal-atoms for columns {dj }Kj=1, a signal y ∈ IRn^ can be represented as a sparse linear combination of these atoms
y = Dx, or y ≈ Dx subject to ‖y − Dx‖p ≤ ε
where the vector x ∈ IRK^ contains the representation coefficients of the signal y,and `p -norm for p = 1, 2 , and ∞ are often used If n < K and D is a full-rank matrix, an infinite number of solutions are available for the representation problems, hence constraints on the solution must be set The sparsest representation is the solution of either
(P 0 ) min x
‖x‖ 0 subject to y = Dx (1)
(P 0 , ε) min x ‖x‖ 0 subject to ‖y − Dx‖ 2 ≤ ε (2)
where ‖ · ‖ 0 is the ` 0 -norm, counting the nonzero entries of a vector
Can either be chosen as a prespecified set of function (i.e., non-adaptive) or designed by adapting its content to fit a given set of signal examples Prespecified transform matrix: wavelets, curvelets, contourlets, steerable wavelet filters, short-time Fourier transforms, random matrices, and more K-SVD: learn a dictionary D from training examples Compressive sensing: use random matrices
Greedy algorithm that finds best matching projection of multidimensional data onto an overcomplete dictionary D Each such dictionary D is a collection of waveforms (φγ )γ∈Γ with γ a parameter
y =
γ∈Γ
αγ φγ , or y =
∑^ m
i=
αγi φγi + R(m)
as an approximate decomposition with residual R(m) Start with an initial approximation y(0)^ = 0 and residual R(0)^ = y, build up a sequence of sparse approximations stepwise At step k, identify the atom that best correlates with the residual (by sweeping all samples), and then add to the current approximation a scalar multiple of that atom, so that y(k)^ = y(k−1)^ + αk φγk where αk = 〈R(k−1), φγk 〉 and R(k)^ = y − y(k) After m steps, obtain the representation in (7) with residual R = R(m)
When the dictionary is orthogonal (e.g., orthogonal wavelet), MP recovers the underlying sparse structure well Computational complexity of MP for encoder is high Improvements include the use of approximate dictionary representations and suboptimal ways of choosing the best match at each iteration (atom extraction) Orthogonal matching pursuit (OMP): an extra step of orthogonalization in MP Take all m terms that have entered at step m and solve the least squares problem min (αi )
‖y −
∑^ m
i=
αi φγi ‖ 2
for coefficients (α( i m))
Then forms the residual R
[m] = y −
∑m i=1 α
(m) i φγi which will be orthogonal to all terms currently in the model
Consider a two-dimensional case
There is an intriguing relation between sparse representation and clustering (i.e., vector quantization) In clustering, a set of descriptive vectors {dk }Kk=1 is learned, and each sample is represented by one of these vectors (based on distance metric e.g., ` 2 -norm) Can think of this as an extreme sparse representation, where only one atom is allowed in the signal decomposition K -means algorithm, also known as the generalized Lloyd (GLA) algorithm, is the most commonly used procedure for clustering Dictionary learning can be considered as generalization of K -means algorithm: I (^) given {dk }Kk=1, assign the training examples to their nearest neighbor I (^) given that assignment, update {dk }Kk=1 to better fit the examples
Assuming the prior is with Laplace distribution p(yi |D) =
p(yi |x, D)p(x)dx = C
exp( (^2) σ^12 ‖Dx − yi ‖^2 ) exp(λ‖x‖ 1 )dx Difficult to evaluate but can be simplified with D = argmax D
i=1 maxxi p(yi^ ,^ xi^ |D) = argmin D
i=1 minxi^ ‖Dxi^ −^ yi^ ‖
(^2) + λ‖xi ‖ 1 (3)
This problem does not penalize the entries of D as it does for of xi , thereby the solution tends to increase the dictionary entries An iterative method was suggested: first calculate the coefficients xi using a simple gradient descent procedure and then update the dictionary using
D(n+1)^ = D(n)^ − η
i=
(D(n)xi − yi )x> i
Related to independent component analysis (ICA) which maximizes the mutual information between inputs (samples) and outputs (coefficients)
Follow closely the K -means outline with a sparse coding stage that uses either OMP or FOCUSS followed by an update of the dictionary Assume that the sparse coding for each example is known, we define the errors ei = yi − Dxi , the overall representation error is
‖E ‖^2 F = ‖[e 1 , e 2 ,... , eN ]‖^2 F = ‖Y − DX ‖^2 F
Assume X is fixed, we can seek an update to D such that the above error is minimized by taking derivative of the above equation w.r.t. D, (Y − DX )X >^ = 0, and have
D(n+1)^ = Y X (n)
> (X (n)X (n)
> )−^1
Related to the maximum likelihood methods
The sparse representation problem can be viewed as a generalization of the VQ problem (4) in which we allow each input signal to be represented by a linear combination
min D,X
‖Y − DX ‖^2 F subject to ∀i ‖xi ‖ 0 ≤ T 0 (5)
, or min D,X
‖Y − DX ‖^2 F subject to ‖Y − DX ‖^2 F ≤ ε (6)
Minimize (5) iteratively by first fix D and find the coefficient matrix X using any pursuit method, and then search for a better dictionary It update one column at a time, fixing all the other columns, and find a new column dk and new values for its coefficients that best reduce the MSE The process of updating only one column of D at a time is a problem having a straightforward solution based on SVD
Assume that both X and D are fixed, and want to add on column in the dictionary dk and the coefficients of k-th row of X is xkT (different from the vector xk which is the k-th column in X ) The objective function can be rewritten as
‖Y − DX ‖^2 F =
Y − DjK=1dj xjT
j 6 =k dj^ x
j T )^ −^ dk^ x
k T
Ek − dk xkT
F Decompose DX to the sum of K rank-1 matrices where K − 1 terms are fixed and the k-th term remains in question It would be tempting to suggest the use of SVD to find alternative dk and xkT The SVD finds the closest rank-1 matrix that approximate Ek However, this minimization does not take sparsity into consideration
Taking the restricted matrix E (^) kR , SVD decomposes it to E (^) kR = UΣV > Define the solution for ˜dk as the first column of U, and the coefficient vector xkR as the fist column of V multiplied by σ 1 In the K-SVD algorithm, one needs to sweep through the columns and use always the most updated coefficients as they emerge from the SVD steps
Initialize: Normalize columns of the dictionary matrix D(0)^ ∈ IRn×K for J = 1, 2 ,... do Sparse coding: Use any pursuit algorithm to compute the representation vector xi for each example yi , by approximating the solution of i = 1,... , N, min xi
‖yi − Dx‖^22 subject to ‖xi ‖ 0 ≤ T 0
Codebook update: For each column k = 1,... , K in D(J−1) Define the group of examples that use this atom, ωk = {i| 1 ≤ i ≤ N, xkT (i) 6 = 0} Compute the overall representation error Ek = Y −
j 6 =k dj^ x
j T Restrict Ek by choosing only the columns corresponding to ωk and obtain E (^) kR Apply SVD decomposition E (^) kR = UΣV >. Choose the updated dictionary column ˜dk to be the first column of U. Update the coefficient vector xkR to be the first column of V multiplied by σ 1 end for