Empirical risk minimization vs expert systems
One way to predict
Often, you cannot know enough about how stuff works in order to have explicit rules. Eg: Expert system for handwriting detection can be tougher to make than a training a statistical model. And, repeatedly, in various fields, statistics based automatic learning of rules has outperformed manually developed expert systems. Eg: Natural language processing.
Here, we mainly consider statistical learning.
Hypothesis classes
General discussion about hypothesis classes described in the Decision Theory chapters applies.
Suppose that the labeling rule, on observing
Probabilistic models
As the hypothesis class, one may choose a family of probabilistic models for
Model/ parameter selection for such hypothesis classes is described in the distribution structure learning part.
Mean or Mode models
Alternately, rather than modeling the pdf
This may actually correspond to proposing deterministic labeling functions which return either label mode or the mean (in case of vector labels). In the latter case, we may also be interested in specifying or modeling
Comparison to probabilistic models
Depending on the loss functions, the risk of different randomized labeling rules with the same expectation may be different. So, when it is reliably possible to do so, modeling the pdf
Yet, task of modeling the expectation or mode of
Probabilistic models: comparison
Often, modeling
But, modeling
Discriminative model corresponding to generative model
Consider the form of the discriminative model
Ease in using unlabeled points
Suppose that the distribution family
If we have unlabeled observations
Discrete deterministic labeling rules
Decision surfaces
One can view discrete labeling rules
k-ary classifier from binary classifier
k-ary classification reducible to binary classification in many ways.
Can learn many ‘one against rest’ classifiers; and then assign the class corresponding to the deepest distance the point achieves from the decision hyperplane.
Can learn many one against one classifiers; and then use majority vote for prediction. This approach often results in ambiguous regions.
Curse of dimensionality
\tbc Exponential increase in volume associated with adding extra dimensions: can’t calculate and record
See also curse of dimensionality subsection in the clustering problem.