Properties

Dependence on input variables

r-junta

f depends on only \(r\) variables. k-junta has at most \(2^k\) non 0 coefficients. Each of them is \(2^{-k+1}\) heavy: Make two equally high stacks unequal.

Witness to the relevance of variable \(x_{i}\): an assignment a which flips f(a) when ith bit is flipped.

Monotone fn f

Flipping \(x_{i}\) from 1 to -1 cannot flip f(x) from -1 to 1. \((\forall i, x_{i} \leq y_{i}) \implies f(x_{1}, .. x_{n}) \leq f(y_{1}, .. y_{n})\).

Define \(g(x_{i}, x_{-i}) = f(x)\) naturally. Then \(g(-1,x_{-i}) = -g(1,x_{-i})\) iff \ \(g(-1,x_{-i}) = -1 \land g(1,x_{-i}) = 1\);\ $$E_{x_{-i}}[g(1,x_{-i}) - g(-1,x_{-i})] = 2 Pr_{x_{-i}}(g(1,x_{-i}) \ = 1 \land g(-1,x_{-i}) = -1 ) $$.

Influence

\(\hat{f}(\set{i}) = 2^{-1}(E_{x_{-i}}[g(1,x_{-i})] - E_{x_{-i}}[g(-1,x_{-i})]) = I_{i}(f)\).

Total influence

As \(\norm{x}{1} \leq \sqrt{n}\norm{x}{2}\): \(I(f) = \sum_{i} \hat{f}(\set{i}) \leq \sqrt{n}\sum_{i} \hat{f}(\set{i})^{2} \leq \sqrt{n}\).

For majority: \(I(f) \approx n(\frac{2}{(n-1)\pi})^{0.5}\).

Fairness

f is fair if it is 1 on exactly half the inputs.

Combinatorial properties

Sensitivity to bit i

Aka Influence of bit i. \(x^{(i)}\): x with ith bit flipped. Sensitivity wrt bit i: \(I_{i}(f) = Pr_{x}(f(x) \neq f(x^{(i)})\).

Total influence or avg sensitivity or Energy: \(I(f) = \sum_{i}I_{i}(f)\).

Noise sensitivity

Aka noise stability. \(y = N_{\eps}(x)\): \ x with \(\eps\) noise in each bit. \(NS_{\eps}(f) = Pr_{x,y}(f(x) \neq f(y)) = 2 Pr_{x,y}(f(x) = 1 \land f(y) = -1)\) by symmetry of sign changing bit flips.

Expressiveness measures

Let \(C\) be a class of boolean function. Viewing \(c\) as the dichotomy it induces on a given set is profitable here.

Count the number of f in C

One can use \(|C|\) or packing/ covering numbers. See the functional analysis survey for details.

Dichotomy count fn

Take S with \(|S| = m\). \(D_{C}(S)\): set of dichotomies on S induced by C; Max dichotomy count: \(D_{C}(m) = \max_{|S|=m}|D_{C}(S)|\).

Shattering and VCD d

If \(D_{C}(S) = 2^{m}\), it is said to be shattered by \(C\). VCD(C) is the size of maximal set which is shattered by \(C\).

Bounding using the VCD

If \(m<=d\), \(D_{C}(m) = \sum_{i=1}^{m}\binom{n}{i} = 2^{m} = O(m^{d})\). If \(m > d\), we use Sauer’s lemma. These bounds are important in quantifying the number of samples required to accurately estimate probabilities of a class of events (an example event: a classifier is wrong.). For details, see statistics and computational learning theory surveys.

\paragraph{Sauer’s lemma} (Sauer) If d = vcd(C), by induction and intermediate function, \ \(D_{C}(m) \leq \sum_{i=1}^{d} \binom{m}{i} \leq (\frac{em}{d})^{d} = O(m^{d})\).

Pf: Let \(s = |X|\). By induction on s+d. \why Intuition: Can’t produce some dichotomy on some d+1 set.

Prove VCD(C) = d

Show some d-sized set shattered by C (Lower bound); show that no \(d+1\) sized set is shattered (Upper bound). Use geometric intuition.

Lower bound

Try making a matrix of points shattered, whose rows are bit vectors of the shattered points.

Upper bound

Find optimal shattering geometry, optimal strategy; Use adversarial labelling. Find \(D_{C}(m)\), solve \(D_{C}(m)<2^{d}\). If C finite, shatters S, \(|C| \geq 2^{|S|}\), so \(\log |C| \geq d\).

VCD’s of some R

Intervals in R: 2. Halfspaces in \(R^{n}\): n+1. Axis aligned rectangles: 4. d-gons in a plane: 2d+1. \(VCD(C_{G}) \leq 2ds\log(es)\).

\(VCD(C_{r,n}) \geq r(\log (n/r))\) if \(C_{r, n}\) embedding closed, contains function f with exactly \(r\) relevant vars. Take f. Key idea: A sample point \(s_{j}\) is the witness of relevance of var j in f. Key idea: Make a matrix whose rows are points shattered by \(C\). Let \(k = \log (n/r)\). Build a \(rk \times r2^k\) block matrix: 1 row block for every \(s_{j}\). In row block j: every column block \(i \neq j\) has same entry: \(s_{j,i}\); but column block i has all separate entries from \({0,1}^k\). To select any set of points/ rows, select embedding of f into the right set of relevant vars.

So, for decision list of \(r\) relevant vars: \(r(\log (n/r))\leq d \leq \)r\( \log n\).