Importance
Fitting a model to observations, ie picking a probability distribution from a family of distributions, is an important component of many statistics tasks where one reasons about uncertainty by explicitly using probability theory. Such tasks are labeled ‘Bayesian inference’.
Choosing the distribution family
Observe empirical distribution
Draw a bar graph, see what the curve looks like.
Given expected values of fns and a base measure h
Suppose we want to modify h as little as possible, under KL divergence, so that it has
Given dependence among features
Use graphical models - see probability ref.
Parametric density estimation
Described in a separate chapter.
Non parametric Probability Density estimation
Estimate distribution on input space using
Histogram and the Kernel histogram
A distribution from bar-graph of frequency vs input interval. Can simply use a histogram.
Kernel density estimation
(Parzen).
Kernel function for density estimation
Non negative real valued integrable K(x) satisfies:
Using Gaussian radial basis functions
Aka Gaussian kernel.\
Taking 1 Gaussian distribution/ adding 1 bump for each data point. h, controlling the variance of the bump, called the smoothing parameter/ bandwidth.
(Can approximate any distribution by mixture of Gaussians!)
Estimate probability measures
Use empirical measures
Empirical measure
The estimated measure using
Goodness of estimate: single event
By law of large numbers, as
Bound variability in estimate
From central limit theorem, we know that, as
Also, we can use:
Goodness of estimate for a class of events
Let
(Vapnik, Chervonenkis).
Then
Proof
If we were to use the union bound and the Hoeffding inequality naively, we would have a factor of
So, first we show that \
Proof
So, now we need only bound
Sharper bound
The sharper bound we require can be obtained using a different analysis, involving
Estimate CDF using empirical CDF
(Glivenko Cantelli). Pick
Answer to: What