General problem
Many (x, y) pairs (observations), h(x, w) form (eg: degree of polynomial) known, parameters
y is the response variable,
Many such continuous valued models are described in probabilistic models reference.
Linear regression
The problem
h(x, w) is linear in
Make matrix A with each row as feature vector
The solution
Want to tune
Quadratic loss function
Get least squares problem
Maximum likelihood estimate with Gaussian noise
If you view y as h(x, w) + gaussian noise n, least squares solution is also the maximum likelihood solution.
Noise distribution symmetric about the mean is not sufficient to lead to least squares solution:
Imposing prior distributions on w
Solutions below assume quadratic loss function to measure deviation from b. Priors implied by regularizers in \(\min e(w) = \min \norm{Aw - b}{2} + p(w)\) where p is some penalty function. Usually \(p(w) = \norm{w}{k}\).
Quadratic regularizer
Assuming gaussian noise, the maximum a-posteriori solution yields the ridge regression problem.
Priors which prefer sparse w
Can use lasso, or compressed sensing. See optimization ref.
Statistical efficiency
N samples,
Solution
See optimization ref.