






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The use of spectral graph partitioning for resource allocation problems, presenting the underlying theory and comparing it to the deterministic annealing algorithm. The author, puneet sharma, discusses spectral algorithms for graph partitioning, the spectral properties of the markov transition matrix, and simulation results for various datasets. The document also touches upon the connection between spectral graph partitioning and markov random walks.
Typology: Study Guides, Projects, Research
1 / 10
This page cannot be seen from the preview
Don't miss anything!







In this report, I have presented the resource location problem in a graph theoretic framework. Spectral algorithms for graph partitioning have been discussed and a probabilistic interpretation of the spectral problem has been presented as a Markov Random walk on graphs. Spectral properties of the Markov transition matrix have been utilized for bi-partitioning (and multi-partitioning) the graph. A comparison with the Deterministic Annealing (DA) algorithm has also been presented and key differences between the two algorithms have been highlighted. Simulation results for graph partitioning have been presented for a number of datasets using both the spectral algorithm and DA.
Many problems which require selection of a subset from a given population or a partition of an underlying domain can be viewed as resource allocation problems, often referred to as locational optimization problems. Locational optimization algorithms arise in a number of contexts in con- trol, for example, motion coordination algorithms, coverage control [10], mobile sensing networks [1], image segmentation [11] and load balancing in distributed computing [4]. These problems share the fundamental goal of aiming to determine an optimal partition of the underlying domain in which they are defined (e.g., a library of compounds for drug discovery, an unknown area of interest for coverage control), and an optimal assignment of values, or elements, from a finite resource set to each cell in the partition space. Computationally, these problems are typically complex and time intensive if not intractable. For example, in the problem of drug discovery, determining 30 representative compounds from an array of 1000 compounds results in approximately 3 × 1025 possibilities. Another factor which adds to the complexity of such problems is their inherent non-convex nature. Thus we require an efficient algorithm that does not get stuck in local minima. Spectral graph partitioning [2, 3]is one such technique largely based on heuristics. The un- derlying data is mapped on to a graph and the spectral properties of a graph are then used to recursively partition the data into clusters. Recently, the spectral algorithms have been applied in a number of different applications for data partitioning [4, 11]. This report has been organized as follows. A brief introduction to graph theory has been presented in 2. Problem formulation for spectral partitioning and solution to a relaxed problem
has also been discussed here. In Section 3, I have presented a probabilistic interpretation of the spectral graph partitioning algorithm by modeling it as a Markov random walk on graphs. It is shown that the spectral properties of the Markov transition matrix can be utilized for graph partitioning. Section 4 contains a comparison of the spectral algorithm with the deterministic annealing algorithm. Section 5 contains the simulation results for the spectral algorithm on a variety of datasets. We finally conclude by discussing some of the future work in Section 6
2 Spectral Graph Partitioning
Consider a weighted undirected graph G = (V, E, W ) with n nodes where the weight matrix W = [wij ] satisfies the following properties:
Edge weight wij is chosen such that it represents the similarity (affinity) between the two nodes. For the resource allocation problem, we choose the edge weight from a Gaussian kernel function.
wij = w(xi, xj ) = exp
−d(xi, xj ) σ^2
where d(xi, xj ) =‖ xi − xj ‖^2 is the Euclidean distance between the two points and the parameter σ is chosen to maintain the threshold above which w(xi, xj ) = 0. Any symmetric, non-negative function monotonically decreasing with increasing distance can be used as a kernel. The param- eter σ is a user-defined value, referred to as the kernel width. The choice of a correct width is critical for the performance of the algorithm. The degree of a node is defined as the sum of all the edges connected that node.
di =
j=1,N
w(xi, xj ) (2)
Define the degree matrix D = diag(di). The Laplacian of the graph is defined as:
L = D − W (3)
We seek to partition the given graph into two disjoint sets A and B such that A ∪ B = V. We can accomplish this by simply removing all the edges that connect any point in A to any other node in B. The amount of dissimilarity between these two sets can be computed by summing up the weights of all the edges removed. This denotes the value of the cut(A, B):
cut(A, B) =
xi∈A,xj ∈B
w(xi, xj ) (4)
The spectral theory of graph partitioning states that the eigenvalues of the graph laplacian L can be used for this bi-partitioning [2, 3]. The Laplacian matrix is symmetric positive semi-definite with pairwise orthogonal eigenvectors. The smallest eigenvalue λ 1 = 0 and thus the second smallest eigenvector solves the bi-partitioning problem.
the finite dimensional Markov operator theory, stationary densities) to analyze the partitioning algorithm. At the same time, it will provide a good framework to compare it to the Deterministic Annealing algorithm for resource allocation problems. A random walk on a graph is finite Markov chain that is time-reversible. Every Markov chain can be viewed as a random walk on a weighted directed graph. Additionally, any time-reversible Markov chain can be viewed as a random walk on an undirected graph. Consider the nodes of the graph as the states of a Markov Chain. The transition probability from one state to the other is proportional to the weight of the edge connecting the two states. In the case of resource allocation problem, each data point represents a node on this graph while the edge weight represents the affinity between two points. The transition probability pij of moving from node i to node j (in one time step) is given by
pij =
wij di
Thus the matrix P = D−^1 W is the transition matrix for the random walk defined on the graph G (with row summing to 1). The spectrum of the transition matrix should give us information about the state of the random walk. All the eigenvalues of P lie in the unit circle with the largest value λ 0 = 1 > λ 1 > · · · λn− 1. Also note that the eigenvector corresponding to λ 0 is the vector (^1) N (which defines the trivial partition - the whole set). The eigenvalue problem for the transition matrix is
P x = D−^1 W x = λx (12)
It should be noted that any pair (x∗, λ∗) that satisfies this problem has the property that (x∗, 1 − λ∗) satisfies the generalized eigenvalue problem of minimizing the Normalized cut (Equation 8).
D−^1 W x∗^ = λ∗x∗^ ⇐⇒ (D − W )x∗^ = (1 − λ∗)Dx∗^ (13)
The second smallest eigenvector of the normalized cut corresponds to the second largest eigen- vector of the Markov random walk transition matrix.
The stationary density p = [p 1 p 2 · · · pN ] of the transition matrix is P T^ p = p. It can be easily verified that
pi = ∑d i dj
defines the stationary density. If we start the random walk form this stationary distribution, then the probability that we will transition from set A to set B is given by
∑^ xi∈A,xj^ ∈B^ piP ij xi∈A di/^
xj ∈V dj
xi∈A,x ∑j ∈B w(xi, xj^ ) xi∈A di
cut(A, B) vol(A)
Thus the Normalized cut criterion minimizes the probability that the walk will leave either of the sets A or B. This is closely related to the almost invariant sets of the random walk (lumping of the Markov matrix).
In [6], it was shown that we can use K eigenvalues of the Laplacian matrix to partition the data into K sets. This partitioning is efficient if the Markov matrix is block stochastic. The cost function for a multi-way cut is :
M N Cut =
i=1,K
j=1,K,j 6 =i
cut(Ai, Aj ) vol(Ai)
The probabilistic interpretation is that the multiway cut tries to find a partition Ai which min- imizes the probability of the random walk to escape from each of the sets. I have presented partitioning results based on the multiway cut in Section 5
4 Comparison of Deterministic Annealing with Spectral Algo-
rithms
The Deterministic Annealing (DA) algorithm is another method to partition the underlying data into clusters. The partitioning criterion used in DA is quite different from the ones used in Spectral methods. In this section, I have tried to highlight some of the connections between these two seemingly different methods. In the DA framework, the fundamental resource allocation problem is stated as: Given a distribution p(x) of the elements x in a descriptor space Ω, find the best set of M resource locations rj that solves the following minimization problem:
min rj , 1 ≤j≤M
Ω
p(x)
min 1 ≤j≤M d(x, rj )
dx. (14)
Alternatively, this problem can also be formulated as finding an optimal partition of the descriptor space Ω into M cells Rj and assigning to each cell Rj a resource location rj such that the following cost function is minimized (^) ∑
j
Rj
d(x, rj )p(x)dx.
The DA algorithm [8, 9] eliminates this local influence of domain elements by allowing each element x ∈ Ω to be associated with every resource location rj through a weighting parameter p(rj |x). The DA formulation includes a modified distortion term
Ω
p(x)
j
d(x, rj )p(rj |x)dx,
and an entropy term
H = −
Ω
p(x)
j
p(rj |x) log p(rj |x)dx,
which measures the randomness of the distribution of associated weights. Minimizing the Free Energy term F = D − T H with respect to association probabilities
Figure 1 shows the partitioning results obtained from a multicut spectral criterion using 4 eigenvalues. The data has been segmented into four natural clusters by the algorithm. An interesting feature of this partitioning is that it is analogous to the physical diffusion process. The ring (blue) is identified as a single cluster because a diffusion process starting inside the annulus will tend to remain inside after a long time. This feature of the spectral partitioning algorithm makes it useful for image segmentation. Figure 2 shows the partitioning results for
(^00 10 20 30 40 50 60 )
5
10
15
20
25
30
35
40
Figure 1: Partitioning using Ncut
a case where the Markov chain is lumpable. It consists of 7 natural gaussian clusters which are separated from one another. The spectral method identifies the natural clusters using the first 7 eigenvalues of the Laplacian matrix. Fig 3 shows the clustering results obtained from the Deterministic Annealing algorithm.
Figure 2: Partitioning determined by Spectral Multicut
For small datasets, the spectral graph partitioning algorithm takes much less time than the DA algorithm. For the dataset shown in Figure 2, spectral method took just 3.4 seconds while the DA algorithm took 8.36 seconds to determine the partitions.
Figure 3: Partitioning determined by Deterministic Annealing
Figure 4: Partitioning determined by Spectral Multicut
One of the drawbacks of the Spectral Graph partitioning algorithms is their inability to deal with large datasets. The large size of the Laplacian matrix makes the computation of the eigenvalues very time intensive. To demonstrate this, I used a dataset containing two gaussian clusters with 5000 points each. The DA algorithm was able to identify the two clusters efficiently but the spectral algorithm could not be tested in MATLAB because of memory overflow during the computation of the 2nd eigenvalue of the Laplacian matrix. The spectral algorithm requires the number of clusters apriori. The partitioning results are not very promising when the number of natural clusters is not known. Figure 5 shows the partitioning results for two clusters. It is clear that the data has 3 (or 4) natural clusters. On the other hand, the DA algorithm identifies the clusters hierarchically (Figure 6). This points to the fact that an effective stopping mechanism has to be designed for the spectral methods.
6 Future Work
As was pointed out earlier, the spectral algorithms do not scale up well with the size of underlying data. A better numerical scheme for computation of eigenvalues can be implemented to overcome this problem. A analytical comparison of the spectral method with the deterministic annealing
datasets were discussed
References
[1] J. Cort´es, S. Mart´ınez, T. Karatas, and F. Bullo. Coverage Control for Mobile Sensing Networks. IEEE Transactions on Robotics and Automation, 20(2):243–255, 2004.
[2] M. Fiedler. Algebraic connectivity of graphs. Czechoslovak Math. J., 23:298–305, 1973.
[3] M. Fiedler. A property of eigenvectors of nonnegative symmetric matrices and its application to graph theory. Czechoslovak Math. J., 25:619–633, 1975.
[4] B. Hendrickson and M. Leland. Multidimensional spectral load balancing. Sandia Natl. Labs, 1993.
[5] S. Lafon and A. Lee. Diffusion maps and coarse-graining: A unified framework for dimen- sionality reduction, graph partitioning and data set parameterization. To appear in IEEE transactions on Pattern Analysis and Machine Intelligence, 2006.
[6] M. Meila and L. Xu. Multiway cuts and spectral clustering. Neural Information Processing Systems, 2003.
[7] B. Nadler, S. Lafon, R. Coifman, and I. Kevrekidis. Diffusion maps, spectral clustering and the reaction coordinates of dynamical systems. submitted to Applied and Computational Harmonic Analysis (2004).
[8] K. Rose. Constrained Clustering as an Optimization Method. IEEE Trans. Pattern Anal. Machine Intell., 15:785–794, August 1993.
[9] K. Rose. Deterministic Annealing for Clustering, Compression, Classification, Regression and Related Optimization Problems. Proceedings of the IEEE, 86(11):2210–39, November
[10] S. Salapaka and A. Khalak. Constraints on Locational Optimization Problems. Proc. IEEE Control and Decision Conference, pages 1741–1746, December 2003.
[11] J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):888–905, August 2000.