Vertex set: metrics, norms

Similarity measures between u and v are inverses of distance metrics.

Based on paths and random walks

  • (shortest path).

Katz measure

sim(u,v)=l=1βl|pathsu,v(l)|. Matrix of scores, K=i=1βiAi.

The damping parameter

K=(IβA)1I if the sum converges. Use \(A = U\EW U^{}\): EW decomposition, with λi ordered in descending order. \(\sum_{i=1}^{\infty} \gb^{i}A^{i} = U(\sum_{i=1}^{\infty} \gb^{i}\EW^{i})U^{}\) does not converge β0 and multiplication by is not well defined. Condition for convergence: βλ1<1.

But, for β>1 the intuition of weighting longer paths less does not hold.

Variants of Katz

Similarly can use eβA.

Truncated Katz usually used: l=1kβlAl: O(ln2nz(A)) op instead of O(n3) inverse finding.

Hitting time

hu,v; normed by stationary distribution: hu,vπv: to take care of skewing of hitting time due to large πv. - Commute time: hu,vhv,u; stationary distr normed: hu,vπvhv,uπu.

Rooted PageRank

Random walk can get lost in parts of graph away from u and v; so do random resets and return to u with probability a at each step.

SimRank

A recursive definition: 2 nodes are similar to the extant that they are joined to similar nodes. sim(x,x)=1;sim(x,y)=aΓ(x)bΓ(y)sim(a,b)|Γ(x)||Γ(y)|.

Common neighbor based

Common neighbors: |Γ(u)Γ(v)|: same as taking inner product of rows in adj matrix M corresponding to u and v. (Jaccard): Γ(u)Γ(v)Γ(u)Γ(v): pick a feature at random, see probability that it is a feature of both u and v. (Adamic/ Adar): zΓ(u)Γ(v)1log|Γ(z)|: greater wt to rare features present in both.