Properties

Dependence on input variables

r-junta

f depends on only r variables. k-junta has at most 2k non 0 coefficients. Each of them is 2k+1 heavy: Make two equally high stacks unequal.

Witness to the relevance of variable xi: an assignment a which flips f(a) when ith bit is flipped.

Monotone fn f

Flipping xi from 1 to -1 cannot flip f(x) from -1 to 1. (i,xiyi)f(x1,..xn)f(y1,..yn).

Define g(xi,xi)=f(x) naturally. Then g(1,xi)=g(1,xi) iff \ g(1,xi)=1g(1,xi)=1;\ Exi[g(1,xi)g(1,xi)]=2Prxi(g(1,xi) =1g(1,xi)=1).

Influence

f^({i})=21(Exi[g(1,xi)]Exi[g(1,xi)])=Ii(f).

Total influence

As \(\norm{x}{1} \leq \sqrt{n}\norm{x}{2}\): I(f)=if^({i})nif^({i})2n.

For majority: I(f)n(2(n1)π)0.5.

Fairness

f is fair if it is 1 on exactly half the inputs.

Combinatorial properties

Sensitivity to bit i

Aka Influence of bit i. x(i): x with ith bit flipped. Sensitivity wrt bit i: Ii(f)=Prx(f(x)f(x(i)).

Total influence or avg sensitivity or Energy: I(f)=iIi(f).

Noise sensitivity

Aka noise stability. y=Nϵ(x): \ x with ϵ noise in each bit. NSϵ(f)=Prx,y(f(x)f(y))=2Prx,y(f(x)=1f(y)=1) by symmetry of sign changing bit flips.

Expressiveness measures

Let C be a class of boolean function. Viewing c as the dichotomy it induces on a given set is profitable here.

Count the number of f in C

One can use |C| or packing/ covering numbers. See the functional analysis survey for details.

Dichotomy count fn

Take S with |S|=m. DC(S): set of dichotomies on S induced by C; Max dichotomy count: DC(m)=max|S|=m|DC(S)|.

Shattering and VCD d

If DC(S)=2m, it is said to be shattered by C. VCD(C) is the size of maximal set which is shattered by C.

Bounding using the VCD

If m<=d, DC(m)=i=1m(ni)=2m=O(md). If m>d, we use Sauer’s lemma. These bounds are important in quantifying the number of samples required to accurately estimate probabilities of a class of events (an example event: a classifier is wrong.). For details, see statistics and computational learning theory surveys.

\paragraph{Sauer’s lemma} (Sauer) If d = vcd(C), by induction and intermediate function, \ DC(m)i=1d(mi)(emd)d=O(md).

Pf: Let s=|X|. By induction on s+d. \why Intuition: Can’t produce some dichotomy on some d+1 set.

Prove VCD(C) = d

Show some d-sized set shattered by C (Lower bound); show that no d+1 sized set is shattered (Upper bound). Use geometric intuition.

Lower bound

Try making a matrix of points shattered, whose rows are bit vectors of the shattered points.

Upper bound

Find optimal shattering geometry, optimal strategy; Use adversarial labelling. Find DC(m), solve DC(m)<2d. If C finite, shatters S, |C|2|S|, so log|C|d.

VCD’s of some R

Intervals in R: 2. Halfspaces in Rn: n+1. Axis aligned rectangles: 4. d-gons in a plane: 2d+1. VCD(CG)2dslog(es).

VCD(Cr,n)r(log(n/r)) if Cr,n embedding closed, contains function f with exactly r relevant vars. Take f. Key idea: A sample point sj is the witness of relevance of var j in f. Key idea: Make a matrix whose rows are points shattered by C. Let k=log(n/r). Build a rk×r2k block matrix: 1 row block for every sj. In row block j: every column block ij has same entry: sj,i; but column block i has all separate entries from 0,1k. To select any set of points/ rows, select embedding of f into the right set of relevant vars.

So, for decision list of r relevant vars: r(log(n/r))drlogn.