Linear operators

Linear Transformation

Definition and Linearity

Linearity

A(ax+by)=aA(x)+bA(y). A called operator, from viewing vector as functions.

Mapping between vector spaces

For any field F, can consider linear transformations A:FnFm.

Applications, examples

Vector spaces can model many real world things (see vector spaces ref), even functions. In all of these, linear transformations have deep meanings.

Over function spaces: Differentiation, integration, multiplication by fixed polynomial in Pn.

Geometric operations: Ax. \ Rotation, projection, reflection, stretching, shearing.

As a matrix

Use action on basis vectors

Take the linear operation A, take standard basis vectors {ei} of dom(A). Take an input x, which, as a combination of basis vectors, is x=ixiei. Now, by linearity, A(x)=ixiA(ei).

(So, A(ei) is the basis of the range space range(A)!)

Matrix

Now, write a matrix A, a 2-dim array of numbers such that the ith column, ai=A:,i=A(ei). This will be a m×n array. Reason for doing this: Ax now be defined to equal A(x).

Matrix * vector multiplication

Define Ax=iaixi. (Voila - linear operation represented by matrix vector multiplication!)

For other views of Ax, see a later section.

Vector dot product

(This also defines row vector * column vector multiplication!) This is also the standard inner product.

Row view: inner product with rows

Take the standard inner product. Then, Ax is the vector formed by (Ai,:,x).

Changing basis vectors

Representation of the same geometric point can change with the choice of bases.

Take the point x=ixiei, written according to the old standard basis {ei}. Express {ei} in terms of the new basis {ei}:, ei=iaiei; write it as the vector ui. Thence, construct the matrix U. Then, Ux=y is the representation of the point in terms of the new basis {ei}.

Representation wrt i/p and o/p bases

Similarly, representation of a linear transformation can also change with choice of bases.

When the right input and output basis is chosen, every matrix is actually diagonal : mere scaling: see SVD.

Forward operation: Ax

Range space

As A is linear, range(A)={Ax:xdom(A)} is the vector (sub)space ai by definition of matrix-vector multiplication. So, aka the column space.

By linearity, A maps x=xr+xnCn,xnN(A),xrN(A) to AxrCm.

Left null space

Also, orthogonal complement of range(A)=N(AT), the Left Null space; both in Cm.

Subspaces of the domain

Null space (Kernel)

N(A): {x:Ax=0}. By linearity of A, this is a vector space. N(A) dimension = degrees of freedom = free vars’ number.

Left null space is N(AT).

Find null-space

Ax=0; Reduce A to U; identify pivot and free vars; find values of pivot vars in terms of f free vars; rewrite as combination of f vectors (basis of null space). If N(A)=C0, every b is unique combo of range(A).

Similarly, Find left null-space N(AT) basis.

Row space

This is the space spanned by rows of A, so it is range(A). SN(A) wrt standard inner product, range(A)+N(A)=Fn.

Every x: Ax0 has a component in the row space.

SVD view

See elsewhere.

Rank of A

Row and column ranks

The number of linearly independent rows in A is row rank. Similarly column rank is defined. So, column rank = dim(range(A)).

Row rank = column rank

Do triangular row elimination to get PA = LU. Then, row rank = number of non-zero rows/ pivots in A.

But, every column ai of A, corresponding to a 0 pivot, is a linear combination of the non-0 pivot columns: construct the matrix A’ with only such columns, and solve Ax=ai using triangular row elimination. So, column rank = number of non-zero pivots.

SVD view

Take SVD: \(A = U \SW V^{}\). Rank r of A corresponds to number of non 0 σi; Can take reduced SVD: \(A = U_r \SW_r V_r^{}\). A actually acts between r-dim subspaces.

Invertability

Both row space and range(A) must have dimension r. Every xr in range(AT) mapped to unique Axr. Invertability, I. Every Axr in range(A) mapped to unique xr: Else, if AxrAxr=0, xrxr is both in row space and N(A). Also, ACm×n (mn) full column rank iff no xr mapped to same Axr. So, A invertible (both as a function and group theoretically) iff r=m=n. So unique decomposition into basis vectors.

Else, use pseudoinverse to get left inverse if m>n, right inverse if n>m. To invert Ax=b for n>m, identify n-m linearly dependent columns in A, set corresponding xi=0, drop those columns to get A’, solve for x’ is Ax=b.

Identity operation

Ix=x.

Matrix multiplication: AB

Composition of transformations

AB is defined to conform to transformation ABx. So, ABCx is associative.

As sum of rank 1 matrices

uv=uvT makes Rank 1 matrix. \(AB = a_{1}b_{1}^{} + .. + a_{p}b_{p}^{}\), where bi=Bi,:. This ensures that ABx=i(Bi,:x)ai, as intended! (Remarkable!)

Similarly, for diagonal matrix D: ADB=iaidi,ibi. So, for symmetric S=UΣUT, ATSB=ATUΣUTB=σi(UTai)T(UTbi)=σiaiTbi.

Elementwise description

From sum of rank 1 matrices form: \((AB){i,j} = A{i,:}B_{:,j}\).

Columns of the result

A acts on B’s rows; B acts on A’s columns. Every col of C=AB is a linear combination of A’s cols according to some col of B: ci=Abi.

Form of ABC

Consider expansion of quadratic functional (see vector spaces ref). Similarly, in D = ABC has Di,j=Ai,:(BC:,j).

So, Di,j is a linear combination of Bi,j.

Computation

Needs O(n3) naively. Else needs O(n2.7) by Strassen alg. If sparse need (n2ν).

Rank

rank(AB) min (rank(A), rank(B)): Take A=LU factorization, as AB = LUB, rank(AB) rank(B).

Inverse and transpose

(AB)1=B1A1 (Take ABx to x.). \((AB)^{}=B^{}A^{*}\) from defn.

So, as \(I = (AA^{-1})^{} = (A^{-1})^{}A^{}\), \((A^{-1})^{} = (A^{})^{-1}\) (aka \(A^{-}\)). Also, \((AA^{})^{} = AA^{}\) and \(A^{}A\) Hermitian.

Block matrices

Block multiplication works.

Other matrix products

Entrywise product

Aka Hadamard product, Schur product. A.A.

Kronecker product

Aka outer product. A m×n, B p×q; C=AB is mp×nq block matrix with Ci,j=Ai,jB.

From defn, A,B:ABBA;(AB)T=BTCT;A(B+C)=AB+AC;aAbB=abAB. (AB)(CD)=ACBD by block multiplicity. So, (AB)1=A1B1. Also, using QA=LDU:rank(AB)=rank(A)rank(B).

λ vector: λ(AB)=λ(A)λ(B): take eigenpairs of A and B: (λi,vi),(μj,uj); (AB)(viuj)=AviBuj=λiμj(viuj).

Inverse operation: Solve Ax = b for x

Solvability Assumption

brange(A).

Left and right inverses

Right inverse: I=AA1. Similarly, left inverse is defined.

Existance conditions

The left inverse exists exactly when you can solve Ax = b for all brange(A). Right inverse exists when you can solve xTA=bTbrange(AT).

Equivalence if both exist

If both left and right inverses exist, they’re the same: Bl1BBr1=Bl1=Br1.

Solutions

Full column rank

If A has full column rank, you have a unique solution, left inverse exists, x=Al1b. Proof: Triangularization by row elimination goes through.

Rank Defective matrix

If A is column-rank deficient, eg: short and fat, you have an overdetermined set of linear equations: many ’equally good’ solutions exist, but you may want one with certain properties (like sparsity). See numerical analysis ref for details.

Finding solutions

See numerical analysis reference for various techniques.

Finding inverses

Find right inverse: Gauss-Jordan: Make augmented matrix A:I; use row operations to reduce it to I:A1. So, A1 (right) iff r=m. Similar trick for left inverse.

Or make cofactor matrix C, use A1=CTdet(A).

Block matrices

Block Gaussian elimination

Take X=[AB CD], solve X[x y]=[u v] for [x y], thence derive X1. Solving for x in the top equation and substituting it in the bottom, you will have reduced the problem to:

[IA1B 0S=DCA1B][x y]=[A10 CA1I][u v].

Block LU

Thence, get block LU, Aka Leibnitz factoriazation: \ X=[A0 CI][IA1B 0S=DCA1B].

Schur complement

S is called the Schur complement of A in X.

Inverse of X

Do block back substitution to get X1.

Pseudoinverse for long thin A

Projector + inverse. For rectangular, non column rank deficient matrices: mn. Takes b=A(xr+xn) to xr in row space, cannot revert Axn=0 (xnN(A)).

\(A^{+} = (A^{}A)^{-1}A^{} = \hat{R}^{-1}\hat{Q}^{}\) (as A=R^Q^) \(= V\hat{\SW}^{-1}\hat{U}^{}\) (from SVD).

Restriction to a subspace S

Let Q be the orthogonal matrix formed by an orthonormal basis of S. Then, projection of ai in S is QQTai. So, Ax=ixiai, when restricted to S, becomes Ax=ixiQQTai. So, QQTA is the operator A restricted to the subspace S.

Submatrices

Principle submatrices: A1:k,1:kk. For each principle submatrix, schur complement is defined: see determinant section.

Random matrices

Random projections

(Johnson Lindenstrauss) See randomized algorithms survey.