Thursday, May 1, 2014

Normalized Graph Cuts and Spectral Algorithms

I recently read multiple papers about normalized graph cuts for segmentation and spectral algorithms. In this post I will highlight some of the similarities between the two. Both algorithms are graph based algorithms for clustering data. I will first show how a data set can used in such a graph environment using gram matrices and then talk more about graph cuts and spectral algorithms.

For N data points X = {x1 ... xN} we can construct a graph G=(V, E) by using every example xi as a vertex and connect every vertex to every other vertex. The weight of this edge eij is defined by a similarity function between xi and xj. Often we use a gaussian kernel as the similarity function:



If we represent this graph as a connectivity matrix, we end with a dense matrix called the gram matrix. In the gram matrix every entry Wij is set to the edge weight between vertex i and vertex j.




A cut in a graph partitions a graph G into two disjoint subsets A, B of vertices that share no edges.
This can be achieved by removing edges from the graph. So a cut is a set of edges when removed partitions the graph into A and B. The weight of the cut is the sum of the weights of this edge set:


For classification we could search the cut in the graph with minimum weight.  This means we remove the edges from the graph where the similarity is lowest. In that way, the two remaining sets A and B are the clusters. However, now the minimum cut might just separate only a few nodes from the rest of the graph. This can be avoided using normalized graph cuts which include the connectivity or degree of each vertex:



We can capture the connectivity in the denominator as the degree matrix. The degree matrix is a matrix with all zeros except the diagonal. Along the diagonal we capture the connectivity of each node:

 

Going towards the solution, the graph Laplacian is defined as:


With some math, we can find the solution by extracting the top k eigenvectors from the matrix:


If we have N examples and k vectors, the result is a k x N matrix. If we transpose it we have N vectors
of size k. Now we can cluster these new instances using k-means. This method is known as k-way approximate graph cut. In my opinion this algorithm is closely related to spectral clustering, which uses a very similar formulation [1]. Most of the post follows[2].


REFERENCES

  • [1] Andrew Y. Ng, Michael I. Jordan, Yair Weiss: On Spectral Clustering: Analysis and an algorithm. NIPS, 2001
    [2] Jianbo Shi, Jitendra Malik: Normalized Cuts and Image Segmentation, PAMI, 2000