Matrix approximation

Approximating a matrix

The problem

Take set of matrices S. Want argminBSAB is minimized wrt some and some set S.

The error metric

Also, often . is an operator norm, as this ensures that (AB)x is low wrt the corresponding vector norm. Other times, as in the case of matrix completion problems, it may be desirable for to be the Frobenius norm.

Low rank (k) factorization problem

In many problems, S is the set of rank k matrices, where k is small.

Often, we prefer AB=XYT rather than computing B, where rank(X), rank(Y) k: A may be sparse, but the best B may be dense, so we may run out of memory while storing, and out of time while computing Bx. You can always get B=XYT from B just by taking SVD of B.

Restriction to subspace view

Similarly, may want restriction of A to a low rank subspace: B=QQTA, Q is low rank, orthogonal; but it should be the best subspace which minimizes the error. A good Q tries to approximate the range space of A.

Sparse approximation

Maybe want \(\min_B \norm{B-A}_2: \norm{B}0 \leq k\). This can be solved just by setting B = A, and then dropping the k smallest \(B{i, j}\) from B.

1 norm regularized form

Even the optimization problem \(\min_B \norm{B-A}_2^{2} + l\norm{B(:)}1 \) leads to sparse solutions. This form is useful in some sparse matrix completion ideas. Solution: just take B=A, and then drom \(B{i,j} \leq l/2\): (thresholding!)

Pf: minBi,j(Bi,jAi,j)2+li,jsgn(Bi,j)Bi,j. Let B’ be the solution to this. If Ai,j0, so is Bi,j; If Ai,j0, so is Bi,j: if they oppose in sign, setting Bi,j=0 definitely lowers the objective. So, get equivalent problem: minBi,j(Bi,jAi,j)2+li,jsgn(Ai,j)Bi,j subject to sgn(A)=sgn(B). The new objective is differentiable. Its optimality condition: Bi,j=Ai,jl/2; but the feasible set only includes sgn(B)=sgn(A). So if Ai,jl/20, the feasible B closest to the minimum of minBi,j(Bi,jAi,j)2+li,jsgn(Ai,j)Bi,j has the corresponding Bi,j=0.

Best rank t approximation of A from SVD

The approximation

Let At=j=1tΣjujvj be the approximation. At is the best rank t approx to A (wrt \(\norm{.}{2}\), \(\norm{.}{F}\)), captures max energy of A possible: AAt=infBAB.

Approximation error

Then, approximation error is AAt=i=t+1rσiuivi. As X=σ1(X): \(\norm{A-A_{t}}{2} = \sw{t+1}\), \(\norm{A-A_{t}}{F} = \sqrt{\sum{i=t+1}^{r} \sw_{i}^{2}}\).

Geometric interpretation

Approximate hyperellipse by line, 2-dim ellipse etc…

Approximating domain and range spaces

Using SVD for example, A can be viewed as a combination of rotation and diagonal matrices. So, getting a low rank approximation of A can be viewed as first getting low rank approximations of the range and domain spaces with orthogonal basis matrices Ut and Vt respectively, and then finding a square S=UtTAVtT such that AUtSVt.

Proof

Proof by ⇒⇐: dim(N(B))= r-t, Let wN(B),Aw=(AB)w<Σt+1w; but t+1 subspace {v:AvΣt+1}; dim({w})+dim({v})=r+1: absurd.

So, $$\sw_{k} = \min_{S \subset C^{n}, dim(S) = n-k+1} \max_{x \in S} \norm{Ax}{2} \ = \max{S \subset C^{n}, dim(S) = k} \min_{x \in S} \norm{Ax}_{2}$$.

Randomized Approach

Take the AB=QQTA view. A good Q must span Uk. Can use something akin to the power method of finding ev. Take random m*k matrix W. Y=(AAT)qAW=UΣq+1VW, and get Y=QR to get Q. q = 4 or 5 is sufficient to get good approximation of Uk. (Tropp etal) if you aim to get k+p approximation, you get low expected error.

With few observations in A only

Same as the missing value estimation problem.

Missing value estimation

The problem

May be you have the matrix A, and you have observed only a few entries O. If there were no other conditions, there would be infinite solutions. But, maybe you also know that A has some special structure: low rank, or block structure or smoothness etc..

Can also view as getting an approximation BS for A, where S is the set of matrices having the specified structure.

Smoothness constraint

Sovle mini,jd(Bi,j,Bi1,j)+d(Bi,j,Bi,j1) such that Bi,j=Ai,j(i,j)O. If we use l2 or l1 metrics for d(), this can be solved with convex optimization.

Low rank constraint

rank(B)k: this is not a convex constraint.

Singular value projection (SVP)

(Jain, Meka, Dhillon) Set \(B^{(0)}{i,j} = A{i,j} \forall (i,j) \in O\), set remaining values in B(0) arbitrarily. Then, in iteration i, do SVD of B(i) to get rank k approximation B(i+1), set \(B^{(i+1)}{i,j} = A{i,j} \forall (i,j) \in O\).

Projection viewpoint

Take S1={B:Bi,j=Ai,j(i,j)O}, S2={B:rank(B)k}\(.Youstartwith\)B0S1\(,projectittotheclosest\)CS2\(,projectCtotheclosest\)B1S1 etc..

Applications

The netflix problem.