\( % groupings of objects. \newcommand{\set}[1]{\left\{ #1 \right\}} \newcommand{\seq}[1]{\left(#1\right)} \newcommand{\ang}[1]{\langle#1\rangle} \newcommand{\tuple}[1]{\left(#1\right)} \newcommand{\size}[1]{\left| #1\right|} \newcommand{\comp}{\circ} % numerical shortcuts. \newcommand{\abs}[1]{\left| #1\right|} \newcommand{\floor}[1]{\left\lfloor #1 \right\rfloor} \newcommand{\ceil}[1]{\left\lceil #1 \right\rceil} % linear algebra shortcuts. \newcommand{\change}{\Delta} \newcommand{\norm}[1]{\left\| #1\right\|} \newcommand{\dprod}[1]{\langle#1\rangle} \newcommand{\linspan}[1]{\langle#1\rangle} \newcommand{\conj}[1]{\overline{#1}} \newcommand{\der}[1]{\frac{d#1}{dx}} \newcommand{\lap}{\Delta} \newcommand{\kron}{\otimes} \newcommand{\nperp}{\nvdash} \newcommand{\mat}[1]{\left[ \begin{smallmatrix}#1 \end{smallmatrix} \right]} % derivatives and limits \newcommand{\partder}[2]{\frac{\partial #1}{\partial #2}} \newcommand{\partdern}[3]{\frac{\partial^{#3 #1}}{\partial #2^{#3}}} \newcommand{\gradient}{\nabla} \newcommand{\subdifferential}{\partial} % Arrows \newcommand{\diverge}{\nearrow} \newcommand{\notto}{\nrightarrow} \newcommand{\up}{\uparrow} \newcommand{\down}{\downarrow} % gets and gives are defined! % ordering operators \newcommand{\oleq}{\preceq} \newcommand{\ogeq}{\succeq} % programming and logic operators \newcommand{\dfn}{:=} \newcommand{\assign}{:=} \newcommand{\co}{\ co\ } \newcommand{\en}{\ en\ } % logic operators \newcommand{\xor}{\oplus} \newcommand{\Land}{\bigwedge} \newcommand{\Lor}{\bigvee} \newcommand{\finish}{\Box} \newcommand{\contra}{\Rightarrow \Leftarrow} \newcommand{\iseq}{\stackrel{_?{=}}} % Set theory \newcommand{\symdiff}{\Delta} \newcommand{\setdiff}{\backslash} \newcommand{\union}{\cup} \newcommand{\inters}{\cap} \newcommand{\Union}{\bigcup} \newcommand{\Inters}{\bigcap} \newcommand{\nullSet}{\phi} % graph theory \newcommand{\nbd}{\Gamma} % Script alphabets % For reals, use \Re % greek letters \newcommand{\eps}{\epsilon} \newcommand{\del}{\delta} \newcommand{\ga}{\alpha} \newcommand{\gb}{\beta} \newcommand{\gd}{\del} \newcommand{\gp}{\pi} \newcommand{\gf}{\phi} \newcommand{\gh}{\eta} \newcommand{\gF}{\Phi} \newcommand{\gl}{\lambda} \newcommand{\gm}{\mu} \newcommand{\gn}{\nu} \newcommand{\gr}{\rho} \newcommand{\gs}{\sigma} \newcommand{\gth}{\theta} \newcommand{\gx}{\xi} \newcommand{\gw}{\omega} \newcommand{\sw}{\sigma} \newcommand{\SW}{\Sigma} \newcommand{\ew}{\lambda} \newcommand{\EW}{\Lambda} \newcommand{\Del}{\Delta} \newcommand{\gD}{\Delta} \newcommand{\gG}{\Gamma} \newcommand{\gW}{\Omega} \newcommand{\gS}{\Sigma} \newcommand{\gTh}{\Theta} % Bold english letters. \newcommand{\bA}{\mathbf{A}} \newcommand{\bB}{\mathbf{B}} \newcommand{\bC}{\mathbf{C}} \newcommand{\bD}{\mathbf{D}} \newcommand{\bE}{\mathbf{E}} \newcommand{\bF}{\mathbf{F}} \newcommand{\bG}{\mathbf{G}} \newcommand{\bH}{\mathbf{H}} \newcommand{\bI}{\mathbf{I}} \newcommand{\bJ}{\mathbf{J}} \newcommand{\bK}{\mathbf{K}} \newcommand{\bL}{\mathbf{L}} \newcommand{\bM}{\mathbf{M}} \newcommand{\bN}{\mathbf{N}} \newcommand{\bO}{\mathbf{O}} \newcommand{\bP}{\mathbf{P}} \newcommand{\bQ}{\mathbf{Q}} \newcommand{\bR}{\mathbf{R}} \newcommand{\bS}{\mathbf{S}} \newcommand{\bT}{\mathbf{T}} \newcommand{\bU}{\mathbf{U}} \newcommand{\bV}{\mathbf{V}} \newcommand{\bW}{\mathbf{W}} \newcommand{\bX}{\mathbf{X}} \newcommand{\bY}{\mathbf{Y}} \newcommand{\bZ}{\mathbf{Z}} \newcommand{\bba}{\mathbf{a}} \newcommand{\bbb}{\mathbf{b}} \newcommand{\bbc}{\mathbf{c}} \newcommand{\bbd}{\mathbf{d}} \newcommand{\bbe}{\mathbf{e}} \newcommand{\bbf}{\mathbf{f}} \newcommand{\bbg}{\mathbf{g}} \newcommand{\bbh}{\mathbf{h}} \newcommand{\bbk}{\mathbf{k}} \newcommand{\bbl}{\mathbf{l}} \newcommand{\bbm}{\mathbf{m}} \newcommand{\bbn}{\mathbf{n}} \newcommand{\bbp}{\mathbf{p}} \newcommand{\bbq}{\mathbf{q}} \newcommand{\bbr}{\mathbf{r}} \newcommand{\bbs}{\mathbf{s}} \newcommand{\bbt}{\mathbf{t}} \newcommand{\bbu}{\mathbf{u}} \newcommand{\bbv}{\mathbf{v}} \newcommand{\bbw}{\mathbf{w}} \newcommand{\bbx}{\mathbf{x}} \newcommand{\bby}{\mathbf{y}} \newcommand{\bbz}{\mathbf{z}} \newcommand{\0}{\mathbf{0}} \newcommand{\1}{\mathbf{1}} % Caligraphic english alphabet \newcommand{\cA}{\mathcal{A}} \newcommand{\cB}{\mathcal{B}} \newcommand{\cC}{\mathcal{C}} \newcommand{\cD}{\mathcal{D}} \newcommand{\cE}{\mathcal{E}} \newcommand{\cF}{\mathcal{F}} \newcommand{\cG}{\mathcal{G}} \newcommand{\cH}{\mathcal{H}} \newcommand{\cI}{\mathcal{I}} \newcommand{\cJ}{\mathcal{J}} \newcommand{\cK}{\mathcal{K}} \newcommand{\cL}{\mathcal{L}} \newcommand{\cM}{\mathcal{M}} \newcommand{\cN}{\mathcal{N}} \newcommand{\cO}{\mathcal{O}} \newcommand{\cP}{\mathcal{P}} \newcommand{\cQ}{\mathcal{Q}} \newcommand{\cR}{\mathcal{R}} \newcommand{\cS}{\mathcal{S}} \newcommand{\cT}{\mathcal{T}} \newcommand{\cU}{\mathcal{U}} \newcommand{\cV}{\mathcal{V}} \newcommand{\cW}{\mathcal{W}} \newcommand{\cX}{\mathcal{X}} \newcommand{\cY}{\mathcal{Y}} \newcommand{\cZ}{\mathcal{Z}} % Formatting shortcuts \newcommand{\red}[1]{\textcolor{red}{#1}} \newcommand{\blue}[1]{\textcolor{blue}{#1}} \newcommand{\htext}[2]{\texorpdfstring{#1}{#2}} % Statistics \newcommand{\distr}{\sim} \newcommand{\stddev}{\sigma} \newcommand{\covmatrix}{\Sigma} \newcommand{\mean}{\mu} \newcommand{\param}{\theta} \newcommand{\gthEst}{\hat{\theta}} \newcommand{\ftr}{\phi} \newcommand{\est}[1]{\hat{#1}} % General utility \newcommand{\todo}[1]{\textbf{[TODO]}] \footnote{TODO: #1}} \newcommand{\tbc}{[\textbf{Incomplete}]} \newcommand{\chk}{[\textbf{Check}]} \newcommand{\why}{[\textbf{Find proof}]} \newcommand{\opt}[1]{\textit{#1}} \newcommand{\experience}[1]{[\textbf{Personal Experience}]: #1 \blacktriangle} \newcommand{\pf}[1]{[\textbf{Proof}]: #1 \Box} \newcommand{\core}[1]{\textbf{Core Idea}: #1 \Arrowvert} \newcommand{\example}[1]{\textbf{Example}: #1 \blacktriangle} \newcommand{\error}[1]{\textbf{Error alert}: #1 \triangle} \newcommand{\oprob}{[\textbf{OP}]: } \renewcommand{\~}{\htext{$\sim$}{~}} \DeclareMathOperator*{\argmin}{arg\,min} \DeclareMathOperator*{\argmax}{arg\,max} \)

01 Distances between distributions

Total variation distance between distributions

Aka Statistical distance. Sample space X. \(\Del(D, D’) = 2^{-1} \sum_{x \in X} |D(x) - D’(x)|\): \(\in [0,1]\). But, \(\sum_{x \in X} (D(x) - D’(x)) = 0\).

Visualize as space between probability curves. Total prob under either curve is 1.

Largest deviation in event probability

For event \(E \subseteq X: \max_{E \subseteq X} |Pr_{D}(x \in E) - Pr_{D’}(x \in E)| = \Del(D, D’)\). Or, max (signed) area between curves covered by E is at most half the total area. Useful in bounding probability of events.

Code-length divergence

(Kullback Leibler) Aka information divergence, information gain, relative entropy. A particular Bregman divergence. General case specified in vector spaces ref. For connection with entropy and cross entropy, see information theory ref.

$$K(D||D’) = E_{x \distr D}[\log \frac{D(x)}{D’(x)}] = \ \sum D(x) \log \frac{D(x)}{D’(x)} = \sum D(x) \log \frac{1}{D’(x)} - H(D) = H_c(D’) - H(D)\(. Expected number of extra bits used to code samples in \)D\( using code based on \)D’$$.

Nonnegativity

See wiki diagram: Puts greater weight D(x), often for cases where \(\frac{D(x)}{D’(x)} \geq 1\).

\(K(D, D’) \geq 0\) (aka Gibbs inequality): Take probability distributions p, q; get \(\sum p_i \log (p_i/ q_i) \geq 0\) using \(\ln x \leq x - 1\). \(K(D||D’) = 0\) only if \(D = D’\) using same idea.

Other properties

Not a metric as it is asymmetric and does not satisfy the triangular inequality.

\(\exists x: D(x) \neq 0, D’(x) =0: \implies K(D||D’) = \infty\).

Connection with variation distance

(Pinsker’s inequality) \(\sqrt{KL(P||Q)/2} \geq \Del{P, Q}\).\why