Differential

Definition

Rate of change

The differential function, aka derivative, is \ f(x)=dfdx=limΔx0f(x+Δx)f(x)Δx, if the limit exists.

Linear approximation view

Hence, f(x+Δx)=f(x)+Δ(x)f(x), as Δx0. As the primary use of the differential function is to be able to make linear/ polynomial approximations to f, one can view the differntial function as a measure of f(x+Δx)f(x) as Δx0.

Other views and generalizations

See the derivatives of general functionals and functions in the vector spaces survey.

Existence: Differentiability

The the above limit exists at a certain point x, f is said to be differentiable at that point.

f can be differentiable on (a,b) but not on endpoints of [a,b].

Relationship with continuity

If f differentiable, f continuous; but not reverse: consider |x|.

Also, f=n=0g(4nx)(3/4)n, where g(x)=g(x+2),g(x)=|x|on[1,1] is continuous but nowhere differentiable: Take fn=g(4nx)(3/4)n; by Weierstrass M test, fn converges to f; also converges uniformly; as the partial sums are continuous functions and as fnf, f is continuous.

Smoothness

If for all k, Dk(x) exists at a certain point x, f is said to be smooth at that point.

Differential operator

Definition

Consider the operator D, which maps a given function f to its differential function f. Note that the differential function may have a smaller domain than f.

Notation

So, D(f)(x)=f(x). Often, this is written as Df(x).

Below, represent the scalar functions f and D(f) as vectors in a vector spaces: see vector spaces and functional analysis surveys for details. Also, f+g,fg,f/g etc.. are naturally defined.

Higher order differentials

These are defined by Dk(x)=D(Dk1(x)) :k>1,kN.

Other notations include: dkf(x)dxk,f(k)(x).

Inverse

Directly from the definition, D(f1)=1D(f).

Linearity

D(f+g)=D(f)+D(g). This follows from the definition of the differential function.

Hence, the differential operator D() is a linear map between these vector spaces. It is a matrix.

Other properties

If f,g:RR differentiable at a certain point x, using : D(fg)=fD(g)+D(f)g. This follows from the definition of the differential function. Similarly, D(f/g)=D(f)gfD(g))/g2.

Composition (chain rule)

Suppose g,h are functions. D(g(h))=D(h)Dh(g), where Dh(g) is the differential function of g evaluated at h(x). This is proved using linear approximations f(x+Δx)=f(x)+f(x)Δx and the definition of D(f).

Parametrically defined functions

Suppose y=f(t) and x=g(t). Then, dydx=f(t)g(t); as y=f(g1(x)).

Differentiation

Differentiation is the procedure of evaluating the differential operator for a certain function.

Differentiation tricks

y=(f(x)g(x)): take log on both sides, differentiate. Chain rule.

Differentials of important functions

Ab initio differentiation of sin x, cos x.

Differentiation of powers and exponents

xn:nN can be derived from the definition. Thus, using the expansion/ definition of ex, D(ex)(x)=ex.

nR:dxndx=denlnxdx=nxn1.

Other properties

Geometry

Geometry is described for the case of general functionals in the vector spaces survey.

Connection with extrema

If f differentiable, f is 0 here (minimum / maximum singularity). See optimization ref.

Effect of uniform convergence

Take fn on [a,b], fnf uniformly.

fn=sinnx/n0, but f(x)0.

If fn continuous, uniformly convergent, and if c(a,b):fn(c) converges (This condition ensures ye’re not adding constants to fn to keep fn convergent but fn non convergent); then f:ltnfn=f,f is differentiable, limfn=f. Pf: Let limfn=g; From fundamental theorem of calculus: cxfn(t)dt=fn(x)fn(c); take limits on both sides, get: cxg(x)dt=f(x)f(c); g continuous as fn continuous, uniformly convergent to g; so use fundamental theorem of calculus again.

Polynomial approximation of f

f(x): Mean change and the gradient

Interior extremum existance

(Rolle) If f is continuous and \ differentiable in [x1,x2],f(x1)=f(x2)=0,c(x1,x2):f(c)=0.

Easy to make a visual argument.

Proof

There exists atleast one maximum and one minimum in [a, b]; if it happens to be in the interior set (a,b), f(x)=0 at this point; otherwise f is a constant function, and there is still an extremum in (a,b).

Mean value theorem

If f is continuous and differentiable in \ [x1,x2],c(x1,x2):f(c)=f(x1)f(x2)x1x2. Easy to visualize. For proof, see the generalization to two functions below.

(Thence, linear approximation to f!)

Relative to another function

If f,g continuous and differentiable: (f(b)f(a))g(x)=(g(b)g(a))f(x): Make new function, apply Rolle. Aka Cauchy’s mean value theorem.

Proof

Suppose that f(b)=f(a)+M(g(b)g(a)). Now, solve for M. Take F(x)=f(x)f(a)M(g(x)g(a)). F(a)=F(b)=0; so because of the Interior extremum existance argument, there must exist some c[a,b] with F(c)=0.

Definite integral view and the mean

[a,b]f(x)dx=f(b)f(a)=f(c)(ba) for some c[a,b]. This can be extended to integration [a,b]f(x)dg(x) wrt another function g(x) using the mean value theorem relative to another function.

As integration can be viewed as an extension of summation, [a,b]f(x)dx(ba)=f(c) can be viewed as the mean of f(x).

Polynomial approximation

Aka Taylor theorem. P(a):=k=0n1f(k)(a)(ba)k/k!; then f(b)=P(a)+R, where R=f(n)(c)(ba)n/n!=[a,b]f(n)(y)(by)n1/(n1)!dy for some c[a,b].

Can then bound error term by bounding f(n)(c).

Proof

We want to find f(b)P(a). Note that P(b)=f(b), and P(x)=k0(f(k)(x)(bx)k1/((k1)!)+f(k+1)(x)(bx)k/(k!))=f(n)(x)(bx)n1/(n1)!: note how P was cleverly defined around b rather than a to let this happen.

From mean value theorem wrt function g, we get f(b)P(a)=P(b)P(a)=f(n)(c)(bc)n1/(n1)!g(c)(g(b)g(a)). Then, using alternatively g(x)=(bx)n and g(x)=[a,x]f(n)(y)(by)n1dy, we get the stated remainders.

Associated series

In Pf, note that, in general, ci getting closer and closer to a as n increases; but we cannot be sure about it: for all we know, all ci may be very close to b.

The polynomial approximation series, aka Taylor series, is the polynomial approximation P(x) as the degree ninf.

McLaurin series: Taylor series about 0.

Importance

Polynomial approximation of functions described above is very important in analyzing the solutions to many problems. This is because one can use the nth degree approximation and upper bounds on |f(n)(x)| to get easy to analyze upper and lower bounds on f(x).

For example, this is used to prove that solutions to certain optimization problems which arise in doing maximum likelihood estimation have desirable properties. Also, several optimization algorithms work by minimizing the polynomial (quadratic in the case of Newton’s method) approximation to f(x), and this analysis is naturally used there.

Extensions

Can’t easily extend to general metric spaces by using a distance function d(): rates of change won’t be negative.

Extension to functionals

See vector spaces ref.