f depends on only variables. k-junta has at most non 0 coefficients. Each of them is heavy: Make two equally high stacks unequal.
Witness to the relevance of variable : an assignment a which flips f(a) when ith bit is flipped.
Flipping from 1 to -1 cannot flip f(x) from -1 to 1. .
Define naturally. Then iff \
;\
.
.
As \(\norm{x}{1} \leq \sqrt{n}\norm{x}{2}\): .
For majority: .
f is fair if it is 1 on exactly half the inputs.
Aka Influence of bit i. : x with ith bit flipped. Sensitivity wrt bit i: .
Total influence or avg sensitivity or Energy: .
Aka noise stability. : \
x with noise in each bit. by symmetry of sign changing bit flips.
Let be a class of boolean function. Viewing as the dichotomy it induces on a given set is profitable here.
One can use or packing/ covering numbers. See the functional analysis survey for details.
Take S with . : set of dichotomies on S induced by C; Max dichotomy count: .
Shattering and VCD d
If , it is said to be shattered by . VCD(C) is the size of maximal set which is shattered by .
If , . If , we use Sauer’s lemma. These bounds are important in quantifying the number of samples required to accurately estimate probabilities of a class of events (an example event: a classifier is wrong.). For details, see statistics and computational learning theory surveys.
\paragraph{Sauer’s lemma}
(Sauer) If d = vcd(C), by induction and intermediate function, \
.
Pf: Let . By induction on s+d. \why Intuition: Can’t produce some dichotomy on some d+1 set.
Show some d-sized set shattered by C (Lower bound); show that no sized set is shattered (Upper bound). Use geometric intuition.
Try making a matrix of points shattered, whose rows are bit vectors of the shattered points.
Find optimal shattering geometry, optimal strategy; Use adversarial labelling. Find , solve . If C finite, shatters S, , so .
Intervals in R: 2. Halfspaces in : n+1. Axis aligned rectangles: 4. d-gons in a plane: 2d+1. .
if embedding closed, contains function f with exactly relevant vars. Take f. Key idea: A sample point is the witness of relevance of var j in f. Key idea: Make a matrix whose rows are points shattered by . Let . Build a block matrix: 1 row block for every . In row block j: every column block has same entry: ; but column block i has all separate entries from . To select any set of points/ rows, select embedding of f into the right set of relevant vars.
So, for decision list of relevant vars: r.