Misc functionals

Functionals over square matrices

Also see functionals over +ve definite A.

Determinant of square A

Definitions

Det(A) or $|A|$ : The recursive defn. Cofactor of $A_{1,1}$: $C_{1,1}$ = det of submatrix (minor) of A. Defn by properties: 1: det I = 1. 2: If 2 rows are equal, det(A) = 0. 3: det $A$ depends linearly on first row.

Properties

So, there is sign change in Det(A) with row exchanges. Also, Det(A) is unchanged with $row_{1} += k row_{2}$. So, 0 row means det $A$ = 0. Also, for L or U, just multiply diagonal. Also, det $A$ = 0 iff $A$ singular.

Multiplicativity and consequences

$|A||B|=|AB|$: See that $\frac{|AB|}{|B|}$ has the 3 properties $|A|$ should have.

So, $|A^{-1}| = \frac{1}{|A|}$. So, $|Q|= \pm 1$. So, considering $|PA| = |LU|$ and $|A^{T}P^{T}| = |U^{T}L^{T}|$, $Det(A^{T}) = Det(A)$. So, column operations don’t alter det $A$, there is sign change with col exchanges, write det $A$ with column cofactors etc..

Connection with ew, sw

Take characteristic polynomial $p_A$. $det(A) = p_{A}(0) = \prod_{i} (l_{i}-l)]_{l=0}$: also by considering $A = QTQ^{*}$. Also, $|det(A)| = \prod \sw_i(A)$.

Connection with rank and semidefiniteness

Thence, det(A) = 0 $\equiv$ $A$ is rank deficient. $det(A) < 0 \implies $A$ \nsucceq 0$: implication is one directional.

Find Det(A)

Reduce $A$ to U, find det U. For each of the n! diagonals, multiply elements; add after accounting for permutation sign: From linearity property, get row exchanged diagonal matrices, consider relation between determinant sign and row exchanges; Hence note equivalence of definitions. Use eigenvalue decomposition: $|A|=|S\EW S^{-1}|=|S||\EW||S|^{-1} = |\EW|$.

Det(A) = volume spanned by row (or col) vectors: orthogonalize (parallelopiped changed to cube of equal volume): $|A|$ unchanged; now $AA^{T}$ is diagonal matrix; $|AA^{T}| = |A|^{2} = vol^{2}$. Perm(A): Same as Det(A), ignore sign.

Block matrices

Leibnitz: \ $X = \mat{A & B \ C & D } = \mat{A & 0 \ C & I } * \mat{I & A^{-1}B \ 0 & D-CA^{-1}B }$. Use this if $A$ square, invertible, thence find determinant.

If X symmetric, ie if $B = C^{T}$: $D-CA^{-1}B$ is the Schur complement of A.

Trace of A

$tr(A) = \sum_{i} a_{i,i}$.

A linear map: tr(kA + lB) = k tr(A) + l tr(B): so convex.

Trace of AB

Trace is an inner product on matrices. It is same as vectorizing $A$ and B and applying the standard inner product.

$tr(AB) = \sum_{i} \sw_{i} \sw_{\pi(i)}$ for some permutation $\pi$: from $|A||B|=|AB|$.

(Von Neumann): $tr(AB) \leq \prod_{i} \sw_{i}(A)\sw_{i}(B)$ \why.

Cyclicity and similarity invariance

$tr(AB) = tr(BA) = \sum_{i} \sum_{j} a_{i,j}b_{j,i}$; but $tr(ABC) = tr(BCA) \neq tr(ACB)$.

Similarity invariant: $tr(P^{-1}AP) = tr(APP^{-1}) = tr(A)$.

Trace of outer products

$tr(ab^{T}) = b^{T}a$ by algebra.

Connection with ew

Take characteristic polynomial $p_A$. tr(A) = coefficient of $l^{m-1}$ in $p_{A}$ = $\sum_{i} l_{i}$.

Gradient to tr

By considering $tr(f(X + \change X)) - tr(f(X))$, get: $\gradient_X tr(X) = I, \gradient_X tr(XA) = \gradient_X tr(AX) = A^{T}, \gradient_X tr(BXA) = \gradient_X tr(ABX) = B^{T}A^{T}$.

By considering $tr(f(X + \change X)g(X + \change X))- tr(f(X)g(X))$, \ $\gradient_X tr(AXBX) = B^{T}X^{T}A^{T} + A^{T}X^{T}B^{T}$.

Functionals over +ve definite matrices

Log det divergence

$f(A) = \log det(A) = \sum_i \log \ew_i(A) = tr(\log(A))$;\ often used because it is convex. $\gradient \log det(A) = A^{-1}$ \why.

Singular values (sw)

See SVD section.

Unitary invariance

$\SW \in R^{mn}$ always, so $\SW = \SW^{}$. $\SW = U^{}AV$: so, $\SW$ is unitary invariant: $\sw_{i}(A) = \sw_{i}(Q_{1}AQ_{2})$.

Let $a = (a_{i,i}), \sw = (\sw_{i})$: By direct computation using $U\SW V^{}$, $a = Z\sw$, where $Z = [u_{ij}v_{ij}] = (U.\bar{V}){1:q, 1:q}$; take $Q = |Z|$; get $|a| \leq Q \sw$. $\norm{q{i}^{}}{1}^{2} = (\sum{j} |u_{ij}v_{ij}|)^{2} \leq \sum_{j} |u_{ij}|^{2}\sum_{j} |v_{ij}|^{2} \leq 1$, by induction; also $\norm{q_{i}}_{1} \leq 1$; so Q is substochastic.

Effect of row or column deletion

$A_{r}$: $A$ with r rows or cols deleted; $\sw_{k}(A) \geq \sw_{k}(A_{r}) \geq \sw_{k+r}(A)$. Prove for r=1, get general case thence. Pf where sth col is deleted: For upper bound, use $\sw_{k}(A_{1}) = \max_{S \subset C^{n}, dim(S) = k} \min_{x \in S} \norm{Ax}{2}$ with extra cond: $x \perp e{s}$; for lower bound use \ $\sw_{k}(A_{1}) = \min_{S \subset C^{n}, dim(S) = n-k} \max_{x \in S} \norm{Ax}_{2}$ with extra cond. For row deletion, consider $A^{*}$.

Sum, product of sw of square A

ew sw sum comparison

By block multiplicity, for any arbit sq orth $x$, Y: $S_{k} = X_{k}^{}AY_{k}^{}$: upper left submatrix of $S = X^{}AY$; So, $\sw_{i}(X_{k}^{}AY_{k}^{}) \leq \sw_{i}(X^{}AY) = \sw_{i}(A)$. So, $|det(X_{k}^{}AY_{k})| = \prod_{i=1}^{k} \sw_{i}(X_{k}^{}AY_{k}) \leq \prod_{i=1}^{k} \sw_{i}(A)$.

For square $A$, take $A = QTQ^{}$, $T = Q^{}AQ$, $(\ew_{i}) \downarrow$; so take k-principle submatrix, use block multiplicity to get: $T_{k} = Q_{k}^{}AQ_{k}$\ So, $|det(T_{k})| = \prod_{i=1}^{k}|\ew_{i}(A)| = |det(Q_{k}^{}AQ_{k})| \leq \prod_{i=1}^{k} \sw_{i}(A)$: = for k=m.

By majorization lore, $|tr(A)| \leq \sum_{i=1}^{q} |\ew_{i}(A)| \leq \sum_{i=1}^{q} \sw_{i}(A)$. Also, for any $p>0, \sum_{i=1}^{q} |\ew_{i}(A)|^{p} \leq \sum_{i=1}^{q} \sw_{i}(A)^{p}$.

Sum of sw of matrix products

$A \in C^{mp}, B \in C^{pn}$.

Take $AB = U\SW V^{}$; $U_{k}^{}ABV_k = \SW_{k}$; polar decomposition of $$BV_{k} = X_{k}Q; \ Q^{2} = V_{k}^{}B^{}BV_{k};\ det(Q^{2}) \leq \prod_{i=1}^{k} \sw_{i}(B^{*}B) = \prod_{i=1}^{k}\sw_{i}(B)^{2}$$.

So, $$\prod_{i=1}^{k} \sw_{i}(AB) = |det(U_{k}^{}ABV_{k})| = |det(U_{k}^{}AX_{k})det(Q)| \ \leq \prod_{i=1}^{k}\sw_{i}(A) \sw_{i}(B)$$.

By majorization lore, for $p>0, \sum_{i=1}^{q} \sw_{i}(AB)^{p} \leq \sum_{i=1}^{q} \sw_{i}(A)^{p}\sw_{i}(B)^{p}$.

Sum of sw as trace maximizer of SVD-like decompositions

$$\sum_{i=1}^{k}\sw_{i}(A) = \max \set{|tr(X^{}AY)|: X^{}X = Y^{*}Y = I, X \in C^{mk}, Y \in C^{nk}}\ = \max \set{|tr(AC)| C \in C^{nm}\texttt{ is rank k partial isometry}}$$

Pf: Can get C from Y, X: $C = YX^{}, \sw_{i}(C^{}C) = \sw_{i}(XX^{})= \sw_{i}(X^{}X) = 1$; Can get Y, X from C: use SVD: $C = U\SW V^{} = U_{k}V^{}{k}$. $|tr(AC)| = |\sum \ew{i}(AC)| \leq \sum_{i=1}^{q} \sw_{i}(AC) \leq \sum_{i=1}^{q}\sw_{i}(A) \sw_{i}(C) = \sum_{i=1}^{k}\sw_{i}(A)$. Take $A = U\SW V^{}$, then for rank k isometry $C = V\hat{I}_{k}U^{}$, $tr(AC) = tr(U\SW\hat{I}{k}U^{*}) = \sum{i=1}^{k}\sw_{i}(A)$.

Triangle inequality

All $\sw_{i}$ in descending order; then: \ $\sum_{i=1}^{k}\sw_{i}(A+B) \leq \sum_{i=1}^{k}\sw_{i}(A) + \sum_{i=1}^{k}\sw_{i}(B)$.

Pf: Let C be some rank k partial isometry:\ $$\sum_{i=1}^{k}\sw_{i}(A+B) = \max \set{|tr((A+B)C)|} \leq \max \set{|tr(AC)| + |tr(BC)|} \ \leq \max \set{|tr(AC)|} + \max \set{|tr(BC)|} = RHS$$.

Convexity, concavity

$\sw_1(X)$ is convex, but $\sw_n(X)$ is concave: $ \sw_1(tA + (1-t)B) = \sup_{\norm{x}=1} tAx + (1-t)Bx \leq \sw_1(A)x + \sw_1(B)x$.

Properties of A discerned from SVD

Rank(A) = num +ve singular values: By geometric action, or by algebra U, V unitary, $\SW$ diagonal: $|A|=|U\SW V^{}|=|U||\SW| |V^{}| =|\SW|$. Similarly, numerical Rank of $A$ = num +ve not-close-to-0 $\sw$. Good way to find rank.

SVD is most accurate method for finding orthonormal basis for N(A) and range(A).

$A=\sum \sw_{i}u_{i}v_{i}^{*}$: $\sum$ rank 1 matrices.