Learnability despite noise

Classification noise

Random noise vs Adversarial noise. Getting (x, l) instead of (x, c(x)).

Malicious noise: both x and c(x) corrupted.

Random Classification noise model

Uniform noise ηη0[0,.5); L given η0, also polynomial in 1.5η0.

Statistical Query (SQ) learning model

(Kearns). A \textbf{statistical query} (criterion, tolerance) = (χ,τ) yields probability Prx(χ) or Ex(χ) of χ over examples. χ efficiently evaluatable, τ not very tiny. L forms h using only statistical queries.

Show non-constructive SQ learnability

If d=sqdD(C) and SD={fi} the corr shattered set; fC,fiSD:|corrD(fi,f)|>(d+1)1. Let \ gS=max(|corrD(fi,fj)|); fi,fjS; \(g^{} = \min \set{g_{S}: |S| = d}\) : pick such S; if \(g^{} \leq (d+1)^{-1}\) sqdDd+1: absurd; so, g>(d+1)1; so swapping any sS with sSD, get result.

Non uniform algorithm

So make SQs: fiSD,E[fic]=? to find fj:E[fjc]>(d+1)1; Pr(fjc)21+(d+1)1. So, weak learning algorithm for C with running time O(dt), t constant.

Also Ω(dt): as sqd=Ω(vcd).

Simulate SQ learning

Noise absent: Take many examples to find Pχ, use Chernoff. \textbf{Classification Noise} present: Sample ηi from [0,1/2) at sufficiently small intervals; Express Pχ as combo of probabilities over noisy examples, η; for each ηi get hi; gauge noisy errors, return h with least noisy error (which grows like actual error).

Efficient PAC learnability with noise

Any C efficiently learnable using SQ is efficiently PAC learnable with classification noise. So, PAC model more powerful than SQ: C learnable in PAC+noise, but not learnable in SQ. (Find proof.) But, Folk theorem: most existing PAC algs extensible to SQ algs.

(Also using statistical queries): Conjunctions. k-CNF: by feature expansion.

Halfspaces (\why: Blum etal): Important subroutine for various learning algs.

Learning Parity

c=pS; classification noise. Best algorithm: 2nlogn. \why

Learning k-PARITY with random noise: Measure errors of all O(nk) h over a random sample set of size O(112ηklogn).

If parity learnable in polytime, DNFs efficiently learnable under U: see reduction from agnostic parity learning.

\oprob: If you can learn parity when there is constant noise, can you learn it when η=1/nr?

Random persistant classification noise

Classification noise model with modification: x: once mq(x) is returned, all future queries mq(x) elicit the same label.

Motivation: Random classification noise can be beaten if MQ(c) oracle present: For each example, make poly(1.5η0) mq and take majority answer.