Regression

General problem

Many (x, y) pairs (observations), h(x, w) form (eg: degree of polynomial) known, parameters w in h(x, w) is unknown. Want to find w. range(h) is not finite, it is usually continuous.

y is the response variable, w is called the regression vector. The matrix formed by x is often called the design matrix.

Many such continuous valued models are described in probabilistic models reference.

Linear regression

The problem

h(x, w) is linear in w (Eg: w1x3+w2x+w3=0: ϕi(x)=xi).

Make matrix A with each row as feature vector ϕ(x) at data point x; take b at diff points; coefficients as variable vector w.

The solution

Want to tune w so that it yields least deviation from b. You measure this using various loss functions.

Quadratic loss function

Get least squares problem mine(w)=minAwb2. This is symmetric, but is sensitive to outliers.

Maximum likelihood estimate with Gaussian noise

If you view y as h(x, w) + gaussian noise n, least squares solution is also the maximum likelihood solution.

Noise distribution symmetric about the mean is not sufficient to lead to least squares solution: mine(w)=minAwb4 symmetrically penalizes deviation from mean just as well.

Imposing prior distributions on w

Solutions below assume quadratic loss function to measure deviation from b. Priors implied by regularizers in \(\min e(w) = \min \norm{Aw - b}{2} + p(w)\) where p is some penalty function. Usually \(p(w) = \norm{w}{k}\).

Quadratic regularizer

Assuming gaussian noise, the maximum a-posteriori solution yields the ridge regression problem.

Priors which prefer sparse w

Can use lasso, or compressed sensing. See optimization ref.

Statistical efficiency

N samples, d dimensions. E[θ^θ2]slogdN.

Solution

See optimization ref.