Take set of matrices S. Want is minimized wrt some and some set S.
Also, often is an operator norm, as this ensures that is low wrt the corresponding vector norm. Other times, as in the case of matrix completion problems, it may be desirable for to be the Frobenius norm.
In many problems, S is the set of rank k matrices, where k is small.
Often, we prefer rather than computing B, where rank(X), rank(Y) : may be sparse, but the best B may be dense, so we may run out of memory while storing, and out of time while computing Bx. You can always get from B just by taking SVD of B.
Similarly, may want restriction of to a low rank subspace: , Q is low rank, orthogonal; but it should be the best subspace which minimizes the error. good Q tries to approximate the range space of A.
Maybe want \(\min_B \norm{B-A}_2: \norm{B}0 \leq k\). This can be solved just by setting B = , and then dropping the k smallest \(B{i, j}\) from B.
Even the optimization problem \(\min_B \norm{B-A}_2^{2} + l\norm{B(:)}1 \) leads to sparse solutions. This form is useful in some sparse matrix completion ideas. Solution: just take B=A, and then drom \(B{i,j} \leq l/2\):
Pf: . Let B’ be the solution to this. If , so is ; If , so is : if they oppose in sign, setting definitely lowers the objective. So, get equivalent problem: subject to . The new objective is differentiable. Its optimality condition: ; but the feasible set only includes . So if , the feasible B closest to the minimum of has the corresponding .
Let be the approximation. is the best rank t approx to (wrt \(\norm{.}{2}\), \(\norm{.}{F}\)), captures max energy of possible: .
Then, approximation error is . As : \(\norm{A-A_{t}}{2} = \sw{t+1}\), \(\norm{A-A_{t}}{F} = \sqrt{\sum{i=t+1}^{r} \sw_{i}^{2}}\).
Approximate hyperellipse by line, 2-dim ellipse etc…
Approximating domain and range spaces
Using SVD for example, can be viewed as a combination of rotation and diagonal matrices. So, getting a low rank approximation of can be viewed as first getting low rank approximations of the range and domain spaces with orthogonal basis matrices and respectively, and then finding a square such that .
Proof by : dim(N(B))= r-t, Let ; but t+1 subspace ; : absurd.
So,
$$\sw_{k} = \min_{S \subset C^{n}, dim(S) = n-k+1} \max_{x \in S} \norm{Ax}{2} \
= \max{S \subset C^{n}, dim(S) = k} \min_{x \in S} \norm{Ax}_{2}$$.
Randomized Approach
Take the view. good Q must span . Can use something akin to the power method of finding ev. Take random m*k matrix W. , and get to get Q. q = 4 or 5 is sufficient to get good approximation of . (Tropp etal) if you aim to get k+p approximation, you get low expected error.
Same as the missing value estimation problem.
May be you have the matrix , and you have observed only a few entries . If there were no other conditions, there would be infinite solutions. But, maybe you also know that has some special structure: low rank, or block structure or smoothness etc..
Can also view as getting an approximation for , where S is the set of matrices having the specified structure.
Sovle such that . If we use l2 or l1 metrics for d(), this can be solved with convex optimization.
: this is not a convex constraint.
(Jain, Meka, Dhillon) Set \(B^{(0)}{i,j} = A{i,j} \forall (i,j) \in O\), set remaining values in arbitrarily. Then, in iteration i, do SVD of to get rank k approximation , set \(B^{(i+1)}{i,j} = A{i,j} \forall (i,j) \in O\).
Take
etc..
The netflix problem.