Misc decompositions

Importance of decompositions

Very important in restating and understanding the behavior of a linear operator. Also, important in solving problems: get decomposition, use it repeatedly. For algebraic manipulation: Factor the matrix: QR, LU, Eigenvalue decomposition, SVD.

EW revealing decompositions

See section on eigenvalues.

Eigenvalue decomposition

Aka Spectral Decomp. Only if A diagonalizable.

Take S: Eigenvectors-as-columns matrix, with independent columns; Λ: Eigenvalue diagonal matrix. Then, AS = SL; So, S1AS=L; A=SLS1: a similarity transformation. Also, If AS=SL, S’s columns must be eigenvectors.

A diagonalized into Λ. A and Λ are similar.

Non-defectiveness connection

S1ΛS iff A is non defective: If S1ΛS: Λ diagonal, non defective, so A non defective; if A non defective: can make non singular S; thence S1ΛS.

Left ev

S1AS=Λ, so S1A=ΛS1. So, the rows of S1=L are the left ev.

Change of basis to ev

x=SS1x=SLx=ix,Li,:si. Thus, any x can be conveniently rewritten in terms of right ew, with magnitudes of components written in terms of left ew.

Unitary (Orth) diagonalizability

For \(A=A^{}\), \(A=-A^{}\), Q etc..

A unitarily diagonalizable iff it is normal: \(A^{}A = AA^{}\): From uniqueness of SVD, \(US^{2}U^{} = VS^{2}V^{}\); so, |U|=|V|; U and V may only differ in sign. So, for some |Λ|=|S|,A=ULU. Aka Spectral theorem.

\htext{\(A = QUQ^{*\)}{Schur} factorization}

(Schur). A and upper traingular U similar; all ew on U’s diagonal. If \(A=A^{}\), \(U=U^{}\), so U is diagonal.

Every A has \(QUQ^{}\): by induction: assume true for m; take any λ and corresponding eigenspace VλVλ; use orth vectors from these spaces as bases; in this basis, operator represented by A has matrix representation \(A’=\mat{\ew I_{\ew} & B\ O & C} = Q^{}AQ\); can then repeat the process for C.

Singular Value Decomposition (SVD)

Reduced (Thin) SVD

If m×n A with m>n, rank r = n, unitary n×n V=[vi], unitary m×n:U^=[ui],n×n:Σ^= diagonal matrix with σi0 in descending order, AV=U^Σ^; then A=U^Σ^V. If r<n; still get reduced SVD by padding V with vectors, U^ with appropriate vectors, Σ with 0 diagonalled columns.

Full SVD

Pad U^ with m-r vectors, Σ^ with 0’s to make U m×m, invertible; V stays same; \(A=U\SW V^{}\). So, SVD for m<n: \(A^{}=V\SW U^{}\). So, \(A^{}U=V\SW\). Also, Avi=uiσi. So, range(A)=u1ur, \(range(A^{})=v_{1} \dots v_{r}\), N(A)=vr+1vm (Avr+1=0), \(N(A^{})=u_{r+1} \dots u_{n}\).

Geometric view

Take \(Ax = U\SW V^{}x\): \(V^{}\) rotates the unit ball to unit ball: viei, Σ stretches unit ball along axes to make it hyperellipse, U rotates it. Every A with SVD maps unit ball to hyperellipse (Eqn: xi2σi2=1): Orthonormal vi (left singular vectors) mapped to orthogonal uiσi (Principle semiaxes, orthonormal right singular vectors × singular values). So \(\sw_{1} = \norm{A}{2}\), \(\sw{n} = \norm{A^{-1}}_{2}\).

From geometric action of UΣVx, every A with SVD is a diagonal matrix when domain and range bases are altered (See Ax=b as AVx=Ub, then Σx=b). ‘If ye want to understand A, take its SVD.’

Existance

Every A has a SVD: by induction; prove 11 case; assume (m-1)(n-1) case \(B=U_{2}\SW_{2}V_{2}^{}\); take mn A; \(\sw_{1} = \norm{A}{2} = \sup |Av|{2}\); so v1 in the unit ball with Av1=u1σ1; So extend v1 and u1 to orthonormal V1 and U1, make Σ1 solving U1AV1=Σ1; 1st col is σ1; as A=Σ=σ1, non-diag elements of 1st row gotto be 0; let rest of σ1=B.

Conditional uniqueness up to a sign

SVD is unique if {σi} unique (‘up to sign’): write Hyperellipse ellipse semiaxes in descending order; but can reverse sign of ui, vi or can multiply them with any suitable pair of complex roots of 1.

Singular value properties

See another section.

Finding SVD using EVD

Use eigenvalue decompositions: \(AA^{} = U \SW^{2}U^{}\), and \(A^{}A = V \SW^{2}V^{}\). Otherwise, find eigenvalue decomposition of B=[0A AT0]: then ew(A) are composed of zeros and sw(A) repeated with different signs. ev of B is (2)1[UnV 2Umn0].

Polar decomposition

mn: take SVD $$A = U [\SW\ 0] [V_{1}\ V_{2}]^{} = U \SW V_{1}^{},\ P^{2} = AA^{} = U \SW^{2}U^{}:+vesemidefinite;takeP = U \SW U^{}:Hermitian+vesemidefinite.So,A = U \SW V_{1}^{} = PUV_{1}^{*} = PY$$, where Y has orthonormal rows.

So, if mn: A=YQ for Hermitian +ve semidefinite Q, Y with orth columns: apply thm to A.

PA = LU

Here unit lower triangular L, upper triangular U.

Can also make: PA = LDU: For , unit upper triangular U, diagonal D.

Existance and uniqueness

Existance

See triangularization by row elimination algorithm in numerical analysis ref. That this runs to completion proves existance.

Uniqueness if P=I

A = LU unique: Else if LU=LU, L1L=UU1: absurd. So A=LDU unique.

For hermitian positive definite matrices

As A0, P = I.

As \(A = LDU = A^{}\), can take \(A = RR^{}\), where R = LD1/2 (Cholesky). It is also unique: rj,j=dj,j>0 fixed by definition; it inturn fixes rest of R.

Importance

Very useful in solving linear equations invloving the same matrix A: can store L, U for repeated reuse.

A = QR = LQ’

Express columns of A as linear combinations of orthogonal {qi}. For proof of existance, see triangular orthonormalization algorithm in numerical analysis ref.

Taking the QR factorization of AT, you also get A=LQT, where L is lower triangular.

Importance

Often, we need to get an orthogonal basis for range(A).

Column Rank deficient A

If A were rank deficient, multiple columns would be linear combinations of same set of qi’s. As Q is square, we would have 0 rows .

Rank revealing QR

In such cases, we can always assume that the 0 rows appear at the bottom, revealing the rank.

Factorization of Hermitian matrices

Unitary diagonalizability, SVD

From Schur factorization: Can write \(A=Q \EW Q^{}\). So, has full set of orthogonal eigenvectors. So, can write: \(A = \sum_i \ew_i q_i q_i^{}\).

Also, singular values si=|λi|, but can’t write \(A = U \SW V^{} = U \SW U^{}\): there may be sign difference between U and V’s columns due to λi<0.

Symmetric LDU factorization

(Cholesky). \(A = R^{}R\). As \(A = LDU^{}=UDL^{}\), \(L=U^{}\). So, \(A = LDL^{} = LD^{1/2}D^{1/2}L^{} = R^{*}R\); dj,j>0 as aj,j>0; rj,j=dj,j>0 chosen.

By SVD, R2=A.

Square root of semidefinite A

\(A = (A^{1/2})^{}A^{1/2}\). Diagonalize, get \(A = QLQ^{}\), A1/2=QL1/2Q : the unique +ve semidefinite solution.

Special linear operators

Orthogonal (Unitary) m*n matrix

Columns orthonormal: QQ=I; and mn.

Change of basis operation

Qx=b: x=Qb: so, x has magnitudes of projections of b on q’s: Change of basis op.

Alternative view: Q(aiqi)=aiei.

Preservation of angles and 2 norms

So, \(\dprod{Qa, Qb} = b^{}Q^{}Qa = \dprod{a,b}\). Also, Qx=x: So, length, angle preserved; analogous to zC, with |z|=1. If Qx=x, QQ=I.

Square Unitary matrix

Orthogonality of rows

If Q square, even rows orthogonal: \(Q^{}Q=I \implies QQ^{}=I\): qiqj=Δi,j (Kronecker Δ = 1 iff i=j, else 0.).

Rotation + reflection

Because of its being a change of basis operation, by geometry, Q is rotation or reflection or a combination thereof.

The distinction between orthogonal matrices constructed purely out of rotation matrices (proper rotation), and those involving orthogonal matrices which involve reflections (improper rotation) is important in geometric computations: in applications such as robotics, computational biology etc..

Determinant

det(Q)=±1:det(QQ)=det(I)=1.

Q is a rotation if |Q|=1 or a reflection if |Q|=1: True for m=2; For m>2, see that determinant is multiplicative.

Permutation matrix P

A permuted I. Permutes rows (PA) or columns (AP). Partial permutation matrix: every row or column has 1 (maybe 0) nz value.

Rotation matrix

To make a rotation matrix, take new orthogonal basis (ui): the coordinate system is rotated, eiui, get matrix U. \((U^{}x)_i = u_i^{}x\): x rotated. Note: Uui=ei.

Reflection matrix

Take reflection accross some basis vector (not any axis). This is just I with -1 instead of some 1.

Linear Projector P to a subspace S

Using General projection definition

See definition of the generalized projection operation in vector spaces ref. Here, we consider the case where projection happens to be a linear operator: that is, it corresponds to the minimization of a convex quadratic vector functional, where the feasible set is a vector space, the range space of the projector P.

Definition for the linear case

P such that P2=P: so, a vector already in S is not affected.

(I-P) projects to N(P): If Pv=0, (I-P)v=v; vice versa. Rank r projectors project to r dimension space.

Oblique projectors project along non orthogonal basis.

Orthogonal projector

Here, (IP)xPx: Eg: projectors which arise from solving the least squares problem. Ortho-projectors orthogonal matrices.

If P=P*, P orth projector: If P=P*, (IP)x,Px=0. If P orth proj; make orthonormal basis for range(P), N(P); get Q; now PQ=QΣ, with σi 1 or 0 suitably: SVD! So, if P orth proj, P=P*.

Ergo, (I-P) also orth proj. Also, \(P = \hat{Q}\hat{Q}^{}\) (As A=Q^R^): Also from \(v = r + \sum (q_{i}q_{i}^{})v\). All Q^Q^ orth proj: satisfy props.

P=1. \why

Hermitian matrix

Aka Self Adjoint Operator. Symmetric matrix: A=AT. It generalizes to Hermitian matrix A=A; analogous to RC. Not all symmetric matrices are Hermitian.

Notation: Symmetric matrices in Rn×n:Sn; +ve definite among them: S++n.

Skew/ anti symmetric matrix: A=AT, generalizes to skew Hermitian.

\(\dprod{Ax, y} = y^{}Ax = \dprod{x, A^{}y}\).

Importance

Appears often in analysis: Any \(B = \frac{B+B^{}}{2} + \frac{B-B^{}}{2}\): Hermitian + Skew Hermitian. Also in projector.

Many applications: Eg: Covariance matrix, adjascency matrix, kernel matrix.

Self adjointness under M

For SPD A, M, M1A self adjoint under M as \(\dprod{x, M^{-1}Ay}{M} = y^{*}Ax = \dprod{M^{-1}Ax, y}{M}\).

Eigenvalues (ew)

All eigenvalues real: \ \(\conj{l}x^{}x = (lx)^{}x = (Ax)^{}x = x^{}(Ax) = lx^{*}x\), so l¯=l.

ev of Distinct ew are orthogonal: \(x_{2}^{}Ax_{1} = l_{1}x_{2}^{}x_{1} = l_{2}x_{2}^{}x_{1}, \therefore x_{2}^{}x_{1}(l_{1}-l_{2}) = 0\).

sw and norm

Unitary diagonalizability possible for A: see section on factorization of hermitian matrices. Thence, |λ(A)|=σ(A); so |λmax(A)|=A.

Factorizations

For details, see section on factorization of hermitian matrices.

Skewed inner prod \htext{\(x^{*Ax\)}{..}}

\(x^{}Ay = (y^{}Ax)^{}\). So, \(x^{}Ax = \conj{x^{}Ax}\); so \(x^{}Ax\) real.

+ve definiteness

Definition

If \(\forall 0 \neq x \in C^{n}: x^{}Ax \in R; x^{}Ax \geq 0\), A +ve semi-definite, or non-negative definitene.

If xAx>0, +ve definite: A0.

Similarly, -ve (semi-)definite defined.

Importance

Important because Hessians of convex quadratic functionals are +ve semidefinite. Also, it is importance because of its connections with ew(A).

+ve semidefinite cone

The set of +ve semidefinite matrices, is a proper cone. If restricted to symmetric matrices, get S+n.

Matrix inequalities

Hence, inequalities wrt the cone defined. (Can write A0 to say A is +ve def.) This is what is usually assumed by when dealing with +ve semidefinite matrices - not elementwise inequalities.

Linear matrix inequality (LMI)

A0+xiAi0. Note that this is equivalent to having A0 with Ai,j=aijTx+bij form. Used in defining semidefinite programming.

Analogy with reals

Hermitians analogous to R, +ve semidef like {0}R+, +ve def like R+ in C.

Support number of a pencil

s(A,B)=argmint:tBA0.

Non hermitian examples

Need not be hermitian always. \ Then, as \(x^{}Bx = x^{}B^{}x\), anti symmetric part in \(B = \frac{B+B^{}}{2} + \frac{B-B^{*}}{2}\) has no effect.

Diagonal elements, blocks

eiAei=ai,i. So, ai,i real. ai,i0 if A +ve semidefinite; ai,i>0 if A +ve definite; but converse untrue.

Similarly, for XCm×n invertible: \(X^{}AX\) has same +ve definiteness as A. Taking X composed of ei, any principle submatrix Ai,i can be writ as \(X^{}AX\); so Ai,i has same positive definiteness as A.

Off-diagonal block signs invertible

Off-diagonal block signs are invertible without loosing +ve semidefiniteness. Pf: If [xTyT][AB CD][x y]0[x y], then [xTyT][AB CD][x y]0[x y].

Eigenvalues: Real, +ve?

i:λiR: take ev x, must be able to compare xTAx=λxx with 0.

If A0, ew λi0: \(\ew_i x^{}x = x^{}Ax \geq 0\). Also, if A0, λi>0.

Determinant

det(A)=λi0.

+ve inner products

For +ve definite matrices, get +ve inner products: Take eigenvalue decompositions: A=iλiqiqiT,B=ilipipiT.

So, the +ve definite cone is self dual.

Invertibility

If A +ve def., A is invertible: x0:xAx0, so Ax0; so A has no nullspace. If A +ve semi-def, can’t say this.

Hermitian +ve definiteness

Also, see properties of not-necessarily symmetric +ve semidefinite matrices.

From general matrices

Any \(B^{}B\) or \(BB^{}\) hermitian, +ve semidefinite: \(x^{}B^{}Bx = \norm{Bx}\). So, if B invertible, \(B^{}B\) is +ve definite. So, if B is long and thin, \(B^{}B\) is +ve definite, but if B is short and fat: so singular, BB is +ve semi-definite, also singular.

Connection to ew

If \(A = A^{}\), all eigenvalues l>0, then A is +ve definite: \(x^{}Ax \in R\), \(x^{}Ax = x^{}U\EW U^{*}x = \sum \ew_{i}x_{i}^{2}\).

Magnitudes of ew same as that of sw: as you can easily derive SVD from eigenvalue decomposition. So, singular value properties carry over.

Connection to the diagonal

Diagonal dominance

If A=A, diagonal dominance and non-negativity of Ai,i also holds, then A is +ve semidefinite. See diagonal dominant matrices section for proof.

Diagonal heaviness

The biggest element of A is the biggest diagonal element. For some k : ak,kai,ji,j. Pf: Suppose ai,j>ak,k; then consider submatrix B=[ai,iai,j ai,jaj,j]; B0, but due to assumption |B|0, hence ⇒⇐.

Check for +ve (semi) definiteness

Do Gaussian elimination, see if pivots 0. \(x^{}Ax = x^{}LDL^{}x\), if pivots good, can say \(= \norm{D^{1/2}L^{}x} \geq 0\).

Block matrices: Schur complement connection

Take X=[AB BTC], S=CBTA1B. Then, if A0, X0S0. Also, X0(A0S0).

Proof: rewrite as optimization problem

Take f(u,v)=uTAu+2vTBTu+vTCv=[uTvT]X[u v]. Solve minuf(u,v). By setting uf(u,v)=0, get minimizer u=A1Bv, f(u,v)=vTSv.

+ve semidefinite cone

Denoted by S++n and S+n.

Self duality

If A,B0, \(\dprod{A, B} = \dprod{\sum_i \ew_i q_i q_i^{}, \sum_j \ew’_i q_j q_j^{}} \geq 0\). So, dual of S++n is itself.

When you consider the dual of a semidefinite program, this is important.

Speciality of the diagonal

Diagonally dominant matrix

|Ai,j|jiAi,j.

Hermitian-ness and +ve semidefiniteness

A hermitian diagonally dominant matrix with non-negative diagonal elements is +ve semi-definite. Pf: Take xTAx=i,jAi,jxixj|Ai,j|(xixj)2. The decomposition reminds one of properties of the graph laplacian. Alternate pf: take ev u, taking Au=λu, show λ0.

If symmetry condition is dropped, +ve semidefiniteness need not hold.

Other Matrices of note

Interesting matrix types

Block matrix; Block tridiagonal matrix.

Triangular matrix

Inverse of L, L’ is easy to find: \(L’{i,i} = L{i,i}^{-1}; L’{i, j} = -L{i,j}^{-1}\).

ew are on diagonal.

Polynomial matrix

P=A(n)xn: Also a matrix of polynomials.

Normal matrix

\(A^{}A= AA^{}\). By spectral thm, A=QΛQ. Exactly the class of orthogonally diagonalizable matrix.

Let a=(ai,i),λ=(ewi): By direct computation, a=Sλ, where S=[|qij|2]=Q.Q¯ is stochastic.

Rank 1 perturbation of I

\(A=I+uv^{}\). Easily invertible: \(A^{-1} = I + auv^{}\) for some scalar a.

k partial isometry

A=UΣV with Σ=[Ik0 00].

Positive matrix A

Get Doubly stochastic matrix

This is important in some applications: like making a composite matrix from the social and affiliation networks.

If A=A, This can be done by first dividing by the largest entry, and then adding appropriate entries to the diagonal.

Can do Sinkhorn balancing: iteratively a] do row normalization, b] column normalization.

Stochastic matrices

Definition

If A is stochastic, A1 = 1, A0.

Eigenspectrum and norm: Square A

1 is an ev

If A is stochastic, A1 = 1. So, 1 is an ew, and 1 is the correspondingn ev.

By Gerschgorin thm, λ(A)[1,1], so, λmax=1.

If A is also real and symmetric, can get SVD from eigenvalue decomposition, and σmax=A2=1.

Product of stochastic matrices

So, if A, B stochastic, AB1 = 1, \(1^{}AB = 1^{}\): AB also stochastic.

Doubly Stochastic matrix S

S is bistochastic if \(S \geq 0, S1 = 1, 1^{}S = 1^{}\).

(Birkhoff): {S} = set of finite convex combos of permutation matrices Pi. Pf of : If convex combo of {Pi}, S stochastic. Every Pi is extreme point of {S}. Every non permutation stochastic matrix A is convex combo of stochastic matrices X = A + aB and Y = A - aB; where B and a are found thus: pick nz aij in row with 2 nz entries, then pick nz akj, then pick nz ak,l etc.. till you hit ai,j seen before; take this sequence T, set a=minT; make ±1,0 matrix B by setting entries corresponding to alternate elements in T 1 or -1. {S} is compact and convex set with {Pi} as extreme points.

Doubly Substochastic matrix Q

equiv Q11,1Q1.

For permutation matrix P, PQ or QP also substochastic.

Q is dbl substochastic iff B has dbl stochastic dilation S: make deficiency vectors dr,dc; get difference matrix Dr=diag(dr),Dc=diag(dc); get S=[QDr DcTQT].

{Q} equivalent to set of convex combos of partial permutation matrices: Dilate Q to S, get finite convex combo of Pi, take the convex combo of principle submatrices.

QCnn is dbl substochastic iff dbl stochastic SCnn with AB: Take any Q, get finite convex combo of partial permutation matrices; alter each to get permutation matrix; their convex combo is S.

Large matrices

Often an approximation to size matrices; so have structure or regularity.

Sparsity

Not density. Very few non zero entries per row: ν. Can find Ax in O(νm), not O(m2) flops. Can find AB in O(mnν).

Iterative algorithms

See numerical analysis ref.