Vertex set: metrics, norms

Similarity measures between u and v are inverses of distance metrics.

Based on paths and random walks

(shortest path).

Katz measure

$s i m (u, v) = \sum_{l = 1}^{\infty} β^{l} | p a t h s_{u, v} (l) |$ . Matrix of scores, $K = \sum_{i = 1}^{\infty} β^{i} A^{i}$ .

The damping parameter

$K = (I - β A)^{- 1} - I$ if the sum converges. Use $A = U\EW U^{}$: EW decomposition, with $λ_{i}$ ordered in descending order. $\sum_{i=1}^{\infty} \gb^{i}A^{i} = U(\sum_{i=1}^{\infty} \gb^{i}\EW^{i})U^{}$ does not converge $\forall β \geq 0$ and multiplication by $\infty$ is not well defined. Condition for convergence: $β λ_{1} < 1$ .

But, for $β > 1$ the intuition of weighting longer paths less does not hold.

Variants of Katz

Similarly can use $e^{- β A}$ .

Truncated Katz usually used: $\sum_{l = 1}^{k} β^{l} A^{l}$ : $O (l n^{2} n z (A))$ op instead of $O (n^{3})$ inverse finding.

Hitting time

$- h_{u, v}$ ; normed by stationary distribution: $- h_{u, v} π_{v}$ : to take care of skewing of hitting time due to large $π_{v}$ . - Commute time: $- h_{u, v} - h_{v, u}$ ; stationary distr normed: $- h_{u, v} π_{v} - h_{v, u} π_{u}$ .

Rooted PageRank

Random walk can get lost in parts of graph away from u and v; so do random resets and return to u with probability a at each step.

SimRank

A recursive definition: 2 nodes are similar to the extant that they are joined to similar nodes. $s i m (x, x) = 1; s i m (x, y) = \frac{\sum_{a \in Γ (x)} \sum_{b \in Γ (y)} s i m (a, b)}{| Γ (x) | | Γ (y) |}$ .

Common neighbor based

Common neighbors: $| Γ (u) \cap Γ (v) |$ : same as taking inner product of rows in adj matrix M corresponding to u and v. (Jaccard): $\frac{Γ (u) \cap Γ (v)}{Γ (u) \cup Γ (v)}$ : pick a feature at random, see probability that it is a feature of both u and v. (Adamic/ Adar): $\sum_{z \in Γ (u) \cap Γ (v)} \frac{1}{\log | Γ (z) |}$ : greater wt to rare features present in both.