Aka COLT/ computational learning theory, Binary classification and Queries: Theory
Themes
Consider the classification task described in the statistics survey. Computational learning theory is about binary classification problem, and getting the best classifier using 0/1 loss.
Hardness of (accurately/ approximately) learning a concept in a concept class in terms of time, space, sample size: Upper (\textbf{Efficient Learning Algorithm design}) and Lower bounds. Doing this: Agnostic or gnostic of distribution; In the presence or absence of noise (in label or in input); relative to ease of evaluation of a concept; Proper learning vs learning using a different Hypothesis concept class.
What concept classes are more general?
How many mistakes does an algorithm make whilst efficiently learning a concept class? Learning using bunch of experts.
Judge the importance of problem/ model: Consider real-life learning scenarios where they are useful. Consider if lower / upper bound for some important scenario may be obtained.
Interplay with other areas.
Observe similarities between compression and learning: learning which is more than mere memorization usually (kNN being an example to the contrary) involves learning some concise, general things from the examples.
Cryptography and learning: opposites.
Folk theorems
The only C we know to learn are polynomials (Eg PTF).
Characterization of research effort
See computational complexity theory ref and algorithms ref; emphasis is on the former.
Notation
The sample
Universal set of possible inputs
Concepts and their sets
Target concept (classifier function) in a certain representation:
Learning algorithm
Learning algorithm
The binary classifier c
For different ways of looking at c, see boolean fn ref. For VCD proofs/ measuring complexity of representation classes, it is profitable to view a hypothesis as the dichotomy it induces on X.
Representation classes of classifiers
Properties of representation classes
Error regions
Class of error regions:
epsilon net of examples
\textbf{
Monotonicity
To understand monotonicity of boolean functions, see boolean functions reference.
Transformation to non-monotone R by introducing n new literals, if n finite.
Closures
Projection closure
Take partial assignment P; let P/A: an assignment to V first using P, then using A for unassigned vars. Projection of an example using P. Projection closed concept class: Take any f in
Embedding closure
Embedding one domain of variables into another; embedding a concept into a domain of variables. Embedding an example. Embedding closed concept class: Take f, rename variables, add irrelevant variables to the domai
PEC classes
Most useful concept classes are projection and embedding \
closed (pec), or are contained in some projection and embedding closed
Some representation classes
Geometric classes
Also see boolean functions ref for halfspaces etc.. Axis aligned rectangles (good basic example).
Boolean functions
See boolean functions ref: k-DNF. k-CNF, Conjunctions. Disjunctions. Decision lists. Decision trees. T-augmented decision trees. Polynomial size DNF. D-PTF. Halfspaces: common benchmark for machine learning algs. Intervals in R. Polyhedra in
Classes of Continuous fns
See complex analysis ref.
Restricted concept classes
Some PAC Reductions
k-CNF to Conjunctions. k-decision list to 1-decision list.
Boolean formula to Log-space turing machine: Parse the circuit, Have identifier for each node.
Log-space turing machine concept class (max klogn space) to Deterministic Finite Automaton (DFA) concept class: DFA state =
Measure complexity of C
Consider
SQ dimension (sqd or d)
Definition
Take c: \
Applications: Learning in the presense of noise.
Bounds on SQD
Max sqd = max
Other properties
All in the corr-shattered set
Other measures
\textbf{Unique -ve dimension}: UND(C) = Max Subclass
Can use covering and packing numbers. These measures of the size of sets are described in the topology ref.
Resources used by the learner
Queries and oracles
An oracle provides examples according to D. A teacher answers membership and equivalence queries.
Example oracle
Probabilistic oracle
PEX(f) for fn
Statistical dist:
Non adaptive mq (namq)
Points on which mq’s are made do not depend on concept being learnt. So, many concepts can be learnt from same set of query points: like the brain. Also, easier parallizability.
Relative power of various queries
As they solve iterated products problem, polynomial sized circuits are not learnable using random examples. By modifying the circuits to encode the ckt at certain fixed points, ye get a class learnable with namq but not with random examples alone. By placing the encoding at varying locations, but encoding the location of the encodings at certain fixed points, ye get a class learnable with mq, but not with namq.
Attribute efficient (ae) learning
r: Number of relevant attributes. \
Alg is poly time, calls to oracle is
Strong ae learning: