Vishvas's notes

Ranking search query results

Queries $q \in Q$ : bag of words, Set of documents D: a seq of words. For query $q_{i}$ , retrieved relevant documents $D_{i} \subseteq D$ ; Got full or partial orderings $L_{i}$ . Let $Q = {q_{i}}$ . Maybe want to complete Q vs D relevence level matrix. Or, given docs D’ relevant for unseen query q, want to rank D'.

Feature extraction

$ϕ : Q \times D \to X \in R^{D}$ ; D is usually $\approx 10$ . $ϕ_{i} (q, d)$ could be TF/IDF, common word count etc.. Let $f : X \to R$ be scoring function; it induces a relevance order ${z_{i}}$ on D for every query $q_{i}$ ; let $F = {f}$ .

Objectives to optimize

Let $y_{i}$ be ground truth relevance-ordering of D corresponding to query $q_{i}$ ; let y be induced by $g : X \to R$ .

Loss function: $L (f, q_{i})$ measures deviation of $z_{i}$ from $y_{i}$ .\ Risk: $R = \int_{Q} L (f, q) f_{Q} (q) d q$ . Want to find f which minimizes L. But Pr(q) usually unknown; so minimize empirical risk: $\hat{R} = n^{- 1} \sum_{i} L (f, q_{i})$ . Assume that $| R - \hat{R} | \to 0$ .

Various Loss functions L

Pointwise

\tbc

Pairwise

Checks if this is violated: $y_{i} (d) > y_{i} (d^{'}) ⟹ f (d) > f (d^{'})$ ; so $(y_{i} (d) - y_{i} (d^{'})) (f (d) - f (d^{'})) \geq 0$ . Widely used. Eg: SVM ordinal regression; ranknet (best pairwise function as of 2009); lambda-rank: $f (d) = w^{T} d$ .

Listwise

Measure badness of a bigger part of the ranking, not just correctness of ranking of pairs of documents. $q_{i}$ uniquely identified by $(y_{i}, D)$ ; so can consider $L (f, y_{i}, D)$ or $L (z_{i}, y_{i})$ .

Using a probability distribution on permutations

Define prob distribution over permutations $P r_{f} (π)$ with good properties: favors permutations rated highly according to f to permutations rated lower, best permutation according to f has max probability. So find f whose $P r_{f}$ is closest to $P r_{g}$ . Can minimize cross entropy: $L (f, g) = - \sum_{π} P r_{g} (π) \log P r_{f} (π)$ (List net); or $L (f, g) = - \sum \log P r (y_{i} | D, f)$ : thereby maximizing likelihood (List-MLE). These Listwise loss calculation is hard: may need to consider n! permutations: so instead consider only ranking of top $k$ docs induced by f. List-MLE known to be best Listwise loss function based ranking method (2009), also better than all pairsise loss function based methods.