Mistake bound (MB) learning model
Problem
Adversarial nature and randomization
The mistake bounded model is adversarial in nature - we are concerned with the number of mistakes made in the worst case. So, randomization helps : the adversary decides on the input before the coin is tossed, so not knowing the algorithm’s output, its attempts to cause mistakes are less successful.
Traits of MB learners
General traits
Any
Careful algorithm
Efficient learnability in MB model
Show runtime per trial = poly(
Learnability in EQ only model
MB oracle can be simulated using an EQ oracle.
Learnability in PAC model
Any C which is efficiently learnable in the MB model is PAC learnable. We use this PAC algorithm: Take sober algorithm, get hypothesis consistant with large enough sample
Or translate to EQ algorithm, then convert to PAC algorithm.
Lower bound
Halving algorithm
For
mb of Halving algorithm is a lower bound: no algorithm can make better use of sample \chk.
\oprob Compare halving algorithm running time with VCD.
Make Mistake bounded learning algorithm for C using H
Decide initial h (maybe null, universal); decide update to hypothesis corresponding to example provided by the sampling oracle: Eg: Learn Disjunctions with mistake bound 2n.
Find mistake bound
Enumerate cases where algorithm makes mistake; find max num of improvements to h possible in each case.
Start with some money; show that you never get to 0.
\oprob In MB model: Does ae learnability imply strong ae learnability?
Mistake bounds (mb) for some R
Disjunctions (so Conjunctions)
Simplified Winnow
Mb = O(k log n), runtime O(n).
Algorithm: Initial
Analysis: On +ve
Decision lists
Length k. 1-Decn lists: Have bag of candidates for 1st position (bag 1); Demote to bag 2 if guess is wrong; etc.. ;
So, t-Decision lists efficiently learnable, by \textbf{Feature expansion}; mb =
Decision trees
Rank of dec tree is
Polynomial size DNF f
Reduce to l-augmented dec tree of rank
Halfspace
Given sample set
See statistics survey for the winnow algorithm, the perceptron algorithm, SVM’s etc.
Learn parity
Use the GF(2) representation of assignments and parity functions. See mq only algorithm to learn parity.
Could be important in learning Fourier coefficients.
\oprob Parity: Show learnability in IMB model.
Halfspaces: Sample net algorithm for Uniform distr
Take S labeled points; given random point p, choose points with positive inner products; return label of the majority.
Halfspaces: Averaging algorithm
Problem
With
Algorithm
Reflect
Proof of goodness
Angle between u,
Show
Show \(u’{h}\) small: Wlog, take
So,
\oprob Prove that
PTF of degree d
Of a panel of k experts
The problem
You have a panel of experts
Best expert for input sequence
It is possible that no
Performance goal
If you knew who the least erroneous expert was, you would use the verdict of that expert alone; but you do not know this. You want an algorithm which combines the verdicts of all these experts in such a way that you do not perform much worse than the best expert.
Weighted majority
Classifier returns
Learning algorithm: init:
Analysis: When the algorithm is wrong,
Matter of focusing wt on the best expert: see also external regret minimization analysis in Game Theory ref.
Randomized weighted majority
In this version of the algorithm, the classifier returns +1 with probability
The expected number of mistakes after a certain number of rounds (rather than a certain number of mistakes) can then be analysed using a similar analysis. It turns out to be better. For intuition as to why this is the case, see note on the adversarial nature of the mistake bounded learning model.
Comparison with halfspace algorithms
By vieweing input bits as experts, learning a halfspace
For comparison with some related half-space learning algorithms, like the winnow and the perceptron learning algorithm, see sections about them in the statistics survey.
Intersection F of k halfspaces
As
Infinite attribute MB learning model (IMB)
Problem and notation
Infinite literals
We can assume that
The key problem
Which attributes are relevant for classification?
Importance
Eg: A rover in mars trying to understand the properties of life there.
ae learning
Same as in MB model, except use n as size of largest example.
Lower bound
Using MB model lower bounds:
Sequential learning algorithm
Algorithm
Init:
Analysis
After
Learning by mapping
Works for pec
Whenever a mistake is made, any new attribute is immediately added to
So, if pec class ae learnable in MB model, it is also learnable in the IMB model. Also, if pec class strongly ae learnable in MB model, it is also strongly ae learnable in the IMB model.
\oprob Show that learnability in IMB implies ae learnability in MB model.
Results
Using
\oprob Learn decision lists in the IMB in time and mistake bound poly(
Expensive Sequential halving algorithm
Sequential learning algorithm where
MBQ and IMBQ models
MB and IMB models with MQ oracles. Trying to learn pec class. An example of Active Learning. Motivating scenarios: password cracking.
If
Lower bound:
Convert an efficient MBQ algorithm (L) with bound
Predictive power of Consistent (maybe small) h
(a,b) Occam algorithm for C using H
Give
So, the size of the
This is reminiscent of the predictive power of Occam razor used in elucidating the philosophy of science after the rennaissance: make as few assumptions in your theory as possible.
Occam razor: Goodness from consistency
Any efficient Occam algorithm can be used to construct a good PAC algorithm: simply draw
\proof: Using the Chernoff bounds, we can say: The empirical estimate of the goodness of a fixed
In other words, we found a way of saying that
VCD - Occam razor: Extension to unbounded C
Even if
Almost consistent h
Suppose that an Occam-like algorithm produces a
Using the Hoeffding’s inequalityuality, for a fixed
As before, the union bound using either the VCD
Occam with Approximate set cover
Take set
Converse to Occam razor
For any pac learnable