03 Bounds on deviation probability

Aka concentration of measure inequalities.

Expectation based deviation bound

(Aka Markov’s inequality). If $X \geq 0$ : $P r (X \geq a) \leq \frac{E [X]}{a}$ : $P r (X \geq a)$ is max when $X$ is 0 or a.

Averaging argument. If $X \leq k$ , $c μ P r (X \leq c μ) + k (1 - P r (X \leq c μ)) \geq μ$ ; so $P r (X \leq c μ) \leq \frac{k - μ}{k - c μ}$ .

This technique is used repeatedly in other deviation bounds based on variance and moment generating functions.

Variance based deviation bound

(Aka Chebyshev’s inequality). By Markov’s inequality: $P r ((X - E [X])^{2} \geq a^{2}) \leq \frac{V a r [X]}{a^{2}}$ .

Use in estimation of mean

$P r (n^{- 1} (\sum X_{i} - E [X_{i}])^{2} \geq a^{2}) = P r ((\sum X_{i} - E [X_{i}])^{2} \geq n a^{2}) \leq \frac{V a r [\sum X_{i}]}{n a^{2}}$ . Applicable for pair-wise independent Bernoulli trials.

Exponentially small deviation bounds

General technique

(Chernoff) $P r (e^{t X} \geq e^{t a}) \leq E [e^{t X}] / e^{t a}$ : applying Markov. Used to bound both $P r (X > a)$ and $P r (X < a)$ with $t > 0$ or $t < 0$ . Get a bound exponentially small in $μ$ , deviation.

For random variable sequences

$μ = \sum E [X_{i}]$ . For $X = \sum_{i = 1}^{n} X_{i}$ . Note that RVs are not necessarily identically distributed.

Pairwise independent RVs

Use variance based deviation bounds, as variance of pairwise independent RVs is an additive function.

Sum of n-wise independent RVs

Bounds from MGF’s.

$P r (e^{t X} \geq e^{t a}) \leq E [e^{t X}] / e^{t a} = (\prod E [e^{t X_{i}}]) / e^{t a}$ : here ye have used independence.

If $d > 0$ , $P r (X \geq (1 + d) μ) \leq \frac{e^{μ (e^{t} - 1)}}{e^{t (1 + d) μ}} \leq \frac{e^{d μ}}{(1 + d)^{(1 + d) μ}}$ : using $t = \ln (1 + d)$ and $M_{X}$ bound.

So, if $R = (1 + d) μ > 6 μ : d = \frac{R}{μ} - 1 \geq 5, P r (X \geq (1 + d) μ) \leq (\frac{e}{6})^{R} \leq 2^{- R}$ .

If d in (0,1], $P r (X \geq (1 + d) μ) \leq e^{\frac{- μ d^{2}}{3}}$ : As $\frac{e^{d}}{(1 + d)^{(1 + d)}} \leq 2^{\frac{- d^{2}}{3}}$ : as $f (d) = d - (1 + d) \ln (1 + d) + \frac{d^{2}}{3} \leq 0$ : as $f (0) \leq 0$ and $f^{'} (d) < 0$ .

If d in (0,1], $P r (X \leq (1 - d) μ) < \frac{e^{- d μ}}{(1 - d)^{(1 - d) μ}}$ ; $P r \leq e^{\frac{- μ d^{2}}{2}}$ .

So, $P r (| X - μ | \geq d μ) \leq 2 e^{- μ d^{2} / 3}$ . \exclaim{So, probability of deviation from mean decreases exponentially with deviation from mean.}

Can be used in both additive and multiplicative forms.

Goodness of empirical mean

Now, $E [X_{i}] = p$ . Using $X / n = \sum X_{i} / n$ to estimate mean p. So, $P r (| \frac{\sum X_{i}}{n} - p | \geq d p) \leq 2 e^{- n p d^{2} / 3}$ . \exclaim{So, probability of erroneous estimate decreasing exponentially with number of samples!}

Code length divergence bound

Let $D_{p}$ and $D_{q}$ be probability distributions of binary random variables with probabilities $p$ and $q$ of being 1 respectively.

$D_{p} (\sum_{i} X_{i} \geq q n) \leq (n - q n) e^{- n K L (D_{p} | | D_{q})}$ .

\pf{Suppose that $X_{i} \sim D_{p}$ and that $p < q, k \geq q n$ .

$D_{p} (\sum_{i} X_{i} = k) \leq \frac{D_{p} (\sum_{i} X_{i} = k)}{D_{q} (\sum_{i} X_{i} = k)} = (\frac{p}{q})^{k} (\frac{1 - p}{1 - q})^{n - k} \leq (\frac{p}{q})^{q n} (\frac{1 - p}{1 - q})^{n (1 - q)} = e^{- n K L (D_{p} | | D_{q})}$ .

So, taking the union bound over all $k \geq q n$ , we have the result.}

Using the connection between the code length divergence and the total variation distance: $K L (D_{p} | | D_{q}) \geq 2 (p - q)^{2}$ . This can be used to derive other deviation bounds.

Additive deviation bounds

See Azuma inequality section.

iid RV: Tightness of the Chernoff bound

(Cramer) Take $l (a) = max_{t} t a - M (a)$ . For large $n$ : $P r (\frac{\sum X_{i}}{n} \geq a) \geq e^{- n (l (a) + ϵ)}$ \why. Combining with Chernoff, $P r (\frac{\sum X_{i}}{n} \geq a) = e^{- n (l (a) + ϵ_{n})}$ for some seq $(ϵ_{n}) \to 0$ .

\part{Probabilistic Analysis Techniques}