03 Bounds on deviation probability

Aka concentration of measure inequalities.

Expectation based deviation bound

(Aka Markov’s inequality). If X0: Pr(Xa)E[X]a: Pr(Xa) is max when X is 0 or a.

Averaging argument. If Xk, cμPr(Xcμ)+k(1Pr(Xcμ))μ; so Pr(Xcμ)kμkcμ.

This technique is used repeatedly in other deviation bounds based on variance and moment generating functions.

Variance based deviation bound

(Aka Chebyshev’s inequality). By Markov’s inequality: Pr((XE[X])2a2)Var[X]a2.

Use in estimation of mean

Pr(n1(XiE[Xi])2a2)=Pr((XiE[Xi])2na2)Var[Xi]na2. Applicable for pair-wise independent Bernoulli trials.

Exponentially small deviation bounds

General technique

(Chernoff) Pr(etXeta)E[etX]/eta: applying Markov. Used to bound both Pr(X>a) and Pr(X<a) with t>0 or t<0. Get a bound exponentially small in μ, deviation.

For random variable sequences

μ=E[Xi]. For X=i=1nXi. Note that RVs are not necessarily identically distributed.

Pairwise independent RVs

Use variance based deviation bounds, as variance of pairwise independent RVs is an additive function.

Sum of n-wise independent RVs

Bounds from MGF’s.

Pr(etXeta)E[etX]/eta=(E[etXi])/eta: here ye have used independence.

If d>0, Pr(X(1+d)μ)eμ(et1)et(1+d)μedμ(1+d)(1+d)μ: using t=ln(1+d) and MX bound.

So, if R=(1+d)μ>6μ:d=Rμ15,Pr(X(1+d)μ)(e6)R2R.

If d in (0,1], Pr(X(1+d)μ)eμd23: As ed(1+d)(1+d)2d23: as f(d)=d(1+d)ln(1+d)+d230: as f(0)0 and f(d)<0.

If d in (0,1], Pr(X(1d)μ)<edμ(1d)(1d)μ; Preμd22.

So, Pr(|Xμ|dμ)2eμd2/3. \exclaim{So, probability of deviation from mean decreases exponentially with deviation from mean.}

Can be used in both additive and multiplicative forms.

Goodness of empirical mean

Now, E[Xi]=p. Using X/n=Xi/n to estimate mean p. So, Pr(|Xinp|dp)2enpd2/3. \exclaim{So, probability of erroneous estimate decreasing exponentially with number of samples!}

Code length divergence bound

Let Dp and Dq be probability distributions of binary random variables with probabilities p and q of being 1 respectively.

Dp(iXiqn)(nqn)enKL(Dp||Dq).

\pf{Suppose that XiDp and that p<q,kqn.

Dp(iXi=k)Dp(iXi=k)Dq(iXi=k)=(pq)k(1p1q)nk(pq)qn(1p1q)n(1q)= enKL(Dp||Dq).

So, taking the union bound over all kqn, we have the result.}

Using the connection between the code length divergence and the total variation distance: KL(Dp||Dq)2(pq)2. This can be used to derive other deviation bounds.

Additive deviation bounds

See Azuma inequality section.

iid RV: Tightness of the Chernoff bound

(Cramer) Take l(a)=maxttaM(a). For large n: Pr(Xina)en(l(a)+ϵ) \why. Combining with Chernoff, Pr(Xina)=en(l(a)+ϵn) for some seq (ϵn)0.

\part{Probabilistic Analysis Techniques}