Node partitioning

Number of partitions, k

See statistics ref for further info.

Minimum cut objectives

Maybe find argmin(Vi)cut((Vi)): but V1=V minimizes it. So, to balance the partitions, associate each vertex with a weight w(v), get W = diag(w(v)); define w(Vi)=vViw(v); minimize Q((Vi))=cut(Vi)w(Vi).

Ratio, normalized cuts

If w(v)=1, get ratio cut objective fn. So, ratio cut is trying to minimize the weight of cross-cut edges, averaged over the nodes.

If w(v)=Ev,w, get normalized cut objective fn: N((Vi))=cut(Vi)w(Vi)=2within(Vi)w(Vi): so minimizing N is same as maximizing wt of edges lying within each partition. Normalized cut is trying to minimize the weight of cross-cut edges, averaged over the nodes but normalized to allow high-degree vertex-sets to have more cross-cut edges.

These are discrete optimization problems: NP complete \why. Effective heuristics exist.

Modularity of partition C

(Newman) Amount by which number of edges in C exceed Chung’s degree distribution preserving random graph model.

Spectral clustering

\core A relaxation of normalized cut objective finding a solution to the generalized eigenvector problem over the graph laplacian L, then partitioning points derived from the top k ev.

The objective

Represent cut by partition vector p{±1}n. Then Rayleigh quotient: pTLppTp=n14cut(V1,V2): pTp=n;pTLp=ei,jEei,j(pipj)2.

Use partition vector q with qi=w(V2)w(V1) for iV1; else w(V1)w(V2). Then qTW1=w(V2)w(V1)w(V1)w(V1)w(V2)w(V2)=0; qTWq=w(V).

As q=w(V1)+w(V2)2w(V1)w(V2)p+w(V1)+w(V2)2w(V1)w(V2)e; so qTLq=[w(V1)+w(V2)]24w(V1)w(V2)pTLp. As pTLp=cut(V1,V2); qTLqqTWq=Q(V1,V2).

Minimizing this subject to suitable q’s is still NP complete; so use real relaxation. So, solve minqqTLqqTWq subject to qTWe=0, deduce partition vector from solution. Opt problem solved when q is ev corresponding to λ2 for generalized ev problem: Lz=λWz. \why

To find k custers, do this recursively, or pick mlogk ev and partition points formed by putting various components of these ev together, maybe using k-means.

Requires O(n2) time: to find ev; O(kn2) to find k clusters among graph nodes.

Coclustering nodes in bipartite graph G=(A, B, E)

\core \ (Dhillon). Finding the second ev of L : take |A||B| adjacency matrix M, and find its 2nd left and right sv: as M is smaller than L, ye save on computation.

Graclus

Does kernel-k-means among nodes. Equivalent to solving a relaxation of the normalized cuts objective.