See statistics ref for further info.
Maybe find : but minimizes it. So, to balance the partitions, associate each vertex with a weight w(v), get W = diag(w(v)); define ; minimize .
If , get ratio cut objective fn. So, ratio cut is trying to minimize the weight of cross-cut edges, averaged over the nodes.
If , get normalized cut objective fn: : so minimizing N is same as maximizing wt of edges lying within each partition. Normalized cut is trying to minimize the weight of cross-cut edges, averaged over the nodes but normalized to allow high-degree vertex-sets to have more cross-cut edges.
These are discrete optimization problems: NP complete \why. Effective heuristics exist.
(Newman) Amount by which number of edges in C exceed Chung’s degree distribution preserving random graph model.
\core A relaxation of normalized cut objective finding a solution to the generalized eigenvector problem over the graph laplacian L, then partitioning points derived from the top k ev.
Represent cut by partition vector . Then Rayleigh quotient: : .
Use partition vector q with for ; else . Then ; .
As ; so . As ; .
Minimizing this subject to suitable q’s is still NP complete; so use real relaxation. So, solve subject to , deduce partition vector from solution. Opt problem solved when q is ev corresponding to for generalized ev problem: . \why
To find k custers, do this recursively, or pick ev and partition points formed by putting various components of these ev together, maybe using k-means.
Requires time: to find ev; to find k clusters among graph nodes.
\core \
(Dhillon). Finding the second ev of L : take adjacency matrix M, and find its 2nd left and right sv: as M is smaller than L, ye save on computation.
Does kernel-k-means among nodes. Equivalent to solving a relaxation of the normalized cuts objective.