Matrix Exponential Update for Kernel Learning from Similarity Matrices | Study Guides, Projects, Research Computer Graphics

CMPS 290C Project Report

Learning Kernel Matrix by Matrix Exponential Update

Jun Liao

1. Introduction

In many applications, unlike traditional machine learning where an example is

represented by a vector of values, it is more natural to represent the relationship between

two examples by a similarity score. In these cases, the data is a matrix. The matrix may

have some missing values that indicate that we don’t know the relationship between the

two examples or they cannot be directly compared. The matrix may be asymmetric. The

relationship between examples may be an ordered relationship. The matrix is also not

necessarily square. A common case is: only a certain number of (positives) examples are

of interests, the similarity scores between a large number of examples and these examples

are computed. Nearest Neighbor method is the most natural and widely used method for

these kinds of data.

In this project, we try to apply kernel-learning methods to similarity matrix data. For the

sake of simplicity, we assume that the similarity matrix be square. The obstacle we meet

is that the similarity matrix is not generally a kernel matrix. As we know, a kernel matrix

need to be square, symmetric, semi positive definite and should not contain missing

values. A simple way to construct a kernel matrix from a similarity matrix is: assume the

similarity matrix is A. Let A=(A+A’)/2. We obtain a symmetric matrix. Note that before

we do the averaging the two symmetric positions, when only one of (i,j) and (j,i) is

missing, we copy its symmetric counterpart to the missing value position. After doing

this, for the missing values in A, we just put zeros there. If A is not a semi positive

definite matrix, we add a positive constant λ into the diagonal elements of A to make it

positive definite: K=A+λI. The λ is set to be slightly greater than the absolute value of

the minimum eigenvalue of A. We call this approach “naïve” approach. Kernel matrix

constructed by this way is called “Diag” kernel. A second approach called “diffusion”

kernel [2] makes use of the property of matrix exponential function. Matrix exponential

always translates a symmetric matrix into a symmetric positive definite matrix. So the

produced matrix can be used as kernel matrix. In this approach, K=exp(βA). exp is

matrix exponential function. A should be a symmetric matrix. β is a constant. The third

approach, matrix exponential update [1] is an on-line algorithm. It also makes use the

property of matrix exponential function, so it is closely related to the diffusion kernel.

However matrix exponential update is more sophisticated. It is derived by using von

Neumann divergence and square loss. Relative loss bound has been established for this

Matrix Exponential Update for Kernel Learning from Similarity Matrices, Study Guides, Projects, Research of Computer Graphics