Convex functional f
Aka concave upwards, concave up or convex cup.
Domain, definitions
dom(f) always a convex set. Visualize as a cup. Epigraph of \(f\) (see analysis of functions over fields ref) is a convex set.
+++(When they say ‘convex function’, they mean ‘convex functional’.)+++
Smoothness along line segment in the domain
Equivalent definition (easy pf): Convex function: \(f(tx + (1-t)y) \leq tf(x) + (1-t)f(y)\): comparing f(some pt on segment [x, y]) with similar pt on secant between f(x) and f(y).
Many points extension
By induction, can extend this to any convex combintation of n points. Jensen’s inequality (see probability ref): simple extension of defn.
Convex sublevelsets
The level set \(\set{x: f(x) \leq \ga}\) is a convex set: take \(f(x), f(y) \leq \ga\); take convex combo z, see \(f(z) \leq \ga\). This is important in convex optimization!
Extension of domain
For \(x \notin dom(x): \hat{f}(x) \dfn +\infty\). Preserves \(f(tx + (1-t)y) \leq tf(x) + (1-t)f(y)\), Simplifies notation. Similar extension for concave functions.
Strict and strong convexity
Strict convexity
A cup, not a hyperplane: imposing curvature. \(f(ax + (1-a)x) < af(x) + (1-a)f(x) \forall a \in [0, 1]\).
Strong convexity with constant m
\(\gradient^{2} f(x) \succeq mI\). Implies strict convexity: consdier 2nd order approximation of convex function \(f\).
Restricted strong convexity at t wrt C when \(d\) restricted to \(d \in C\).
f restricted to a line
Take \(f:R^{n} \to R\). \(f\) convex iff \(\forall t: g(t) = f(x + tv)\) is convex. \(x + tv\) is a line in the domain. Good for showing convexity!
Gradient tests
First order condition
If \(f:R \to R\) differentiable, \(f\) convex iff f’(x) non decreasing: \(f(y) \geq f(x) + f’(x)(y-x)\). Aka first order condition.
Pf: If \(f\) convex, taking the convex combination of y and x: \(\forall t: f(x + t(y-x)) \leq (1-t)f(x) + tf(y)\). So consider \(f(y) \geq f(x) + \frac{f(x + t(y-x)) - f(x)}{t}\)as \(t\to 0\).
Extend to \(f:R^{n} \to R\) by considering \(f\) restricted to a line: \(f(y) \geq f(x) + \dprod{\gradient f(x),(y-x)}\). The RHS is the supporting hyperplane to the epigraph of \(f\).
First order condition for non-smooth f
The subdifferential set \(\subdifferential f(x) \neq \emptyset\) for any \(x\).
Second order condition
So, if \(f(x)\) twice differentiable, \(f\) convex iff \(f’’(x) \geq 0\) ie: \(f’\) non decreasing. By techniques similar to first order condition, \(f\) convex if \(\gradient^{2}f(x) \preceq 0\). Aka second order condition.
Supporting hyperplane to the epigraph
Differntiable f
See description of tangent hyperplanes to epigraphs in the ’topological properties of functionals’ subsection. For differentiable \(f\), the tangent hyperplane at \((x_1, f(x_1))\) is specified by \((\gradient f(x_1), -1)\). Pf: From first order condition: \(f(y) \geq f(x) + f’(x)(y-x)\).
So, \(f(x) + f’(x)(y-x)\) is a global lower bound on \(f\).
Non differentiable convex f: subdifferentials!
Can extend the idea of a gradient to non-differentiable convex \(f\). The subdifferential set \(\subdifferential f(x) = \set{z: \forall x, f(y) \geq f(x) + \dprod{z,(y-x)}}\).
Each \(z \in \subdifferential f(x)\) is a subgradient, a generalization of the concept of gradient. \((z, -1)\) is a supporting hyperplane to the epigraph of \(f\) at \((x, f(x))\). Also, \(f(x) + \dprod{z,(y-x)}\) is a global lower bound on \(f\).
\(\subdifferential f(x)\) is a closed, convex set.
If \(x \in int(dom(f))\), \(\subdifferential f(x)\) is non-empty and bounded.
Non subdifferentiable functionals
Eg: \(-\sqrt{x}\): supporting hyperplane at (0, 0) is vertical.
Supporting hyperplanes to sublevel sets
See the description of the connection between contour graphs and the gradient in the section on derivatives of functionals. At differentiable points, gradients define tangent hyperplanes to the contour; for convex functionals, these are also supporting hyperplanes to the sublevel set.
At non differentiable points, the gradient can be substituted with the subgradient.
Finding Subdifferentials
First identify the non-differntiable point, then just apply the definition.
If \(f\) is differentiable, \(\subdifferential f(x)\) contains just the gradient.
Unlike finding gradients
Where \(f\) is differentiable, to find \(\gradient f(x)\), once can merely compute \(\partder{f(x)}{x_i} \forall i\), and arrange the result in a vector.
For finding the subdifferential of \(f\) at a non-differentiable point, this does not work. Taking the OR of conditions describing subdifferentials of restrictions of a function to individual axes yields a superset of that function’s subdifferential! For example, applying this flawed technique to finding \(\subdifferential \norm{0}2\) yields the unit ball of \(\norm{x}\infty\). +++(Can’t do it one coordinate at a time.)+++
Of non-negative linear combo
Take \(f(x) = ag(x) + a’ r(x);\ a, a’ \geq 0\). \(\subdifferential f(x) \supseteq \set{z’ + z: z’ \in \subdifferential g(x) z \in \subdifferential r(x)}\): applying dfn.
Of penalties plus regularizers
Commonly encountered in statistics and modelling.
If \(f(x) = g(x) + r(x)\) is convex, g is differentiable but not r: \(\subdifferential f(x) = \set{\gradient f(x) + z: z \in \subdifferential r(x)}\).
eg: Take \(f(x) = g(x) + l\norm{x}_1\), where g is convex. Here, \ \(\subdifferential f(x) = \set{\gradient g(x) + lz| \forall i: x_i \neq 0, z_i = sgn(x_i); \texttt{ else }: |z_i| <= 1}\).
Of f(Ax + b)
\(\subdifferential f(Ax + b) = A^{T} \subdifferential f(x)\). Pf: $$\subdifferential f(Ax + b) = \ \set{t’: \forall d: f(Ax + b + Ad) \geq f(Ax + b) + \dprod{t’, d}} \ = \set{A^{T}t: \forall d: f(Ax + b + Ad) \geq f(Ax + b) + \dprod{A^{T}t, d}} \ = A^{T} \subdifferential f(x)$$.
So, linear shift \(x \to x+b\) does not change the subdifferential.
Of max of functionals
Take \(f(x) = \max \set{f_1(x) .. f_m(x)}\). Take \(act(x) = \set{i: f_i(x) = f(x)}\); then \(\subdifferential f(x) = conv(\union_{i \in act(x)} \subdifferential f_i(x))\).
This property can be used to find subdifferential of the \(\infty\) norm.
Subdifferentials of norms
For any norm, subdifferential at 0 is just the unit ball of the corresponding dual norm!
Proof
\(\subdifferential \norm{0} = \set{z: \forall d: \norm{0+d} \geq \norm{0} + \dprod{d, z}} = \set{z: \forall d: \norm{d} \geq \dprod{d, z}} \= \set{z: \forall \norm{d} = 1: 1 \geq \dprod{d, z}} = \set{z: 1 \geq \norm{z}^{D}}\).
For \(\norm{x}\infty\), many other points are non differentiable. Using the form of \(\subdifferential \max f_i(x)\), we get: \(x\neq 0: \subdifferential \norm{x}\infty = conv(\set{sgn(x_i)e_i: \forall |x_i| = \norm{x}_\infty})\).
Operations which preserve convexity
What preserves the curvature? Let \(f\), g be convex.
Sum, max
\(a_1 f(x)+ a_2 g(x): a_i \geq 0\) convex.
max(f, g)(x), \(sup_{f \in F}f(x)\) convex. Eg: \(\sup_{c \in C} \norm{c-x}, \ew_max(A)\). This is widely used: eg: concavity of dual function, conjugate function.
Minimization over some dimensions
Take f(x, y) convex in both \(x\) and \(y\), C a convex set. From defn, \(\inf_{y \in C} f(x, y)\) also convex.
Composition with affine transform
If f(x) is convex, so is g(x) = f(Ax+b): from defn; Even concavity preserved.
Other compositions
Take \(h:R^{k}\to R\) convex. If \(g:R^{n} \to R^{k}\) componentwise convex, \(\hat{h}\) also componentwise non-decreasing, h(g(x)) convex: from defn. Also, if g concave, \(\hat{h}\) also componentwise non-increasing h(g(x)) convex.
Transformations
Perspective of a functional (not same as perspective fn): See \cite{cvx_boydVan}.
conjugate of \(f\) also convex: supremum of affine functions.
Important instances
In R
Affine functional: ax + b (also concave, so linear).
Exponential \(e^{ax} \forall a\).
Powers \(x^{a} \forall a \notin (0,1)\).
Negative entropy: \(x \log x\).
Matrix functionals
\(f(Y) = \log \det Y; \gradient f(Y) = Y^{-1}\) \why.
Log sum exponents
\(f(x) = \log \sum e^{x_i}\) is convex: its Hessian \(H \succeq 0\) (can’t simply use composition rules!). Pf: Let \(z_i = e^{x_i}\); then \(\gradient f(x) = (\sum_i z_i)^{-1}z\), Hessian is \(H = (\sum_i z_i)^{-2}((\sum_i z_i)diag(z) -zz^{T})\). \((\sum_i z_i)^{2}x^{T}Hx = (\sum_i z_i)(\sum_i x_i^{2}z_i) - (x^{T}z)^{2}\). Take \(a_i = z_i^{1/2}, b_i = x_i z_i^{1/2}\), use Cauchy schwartz to see \(x^{T}Hx \geq 0\).
So, its composition with Affine transformation: \(\log \sum e^{a_i^{T}x + b_i}\).
Importance
The convexity of these functionals is important because they are used to specify the -ve log likelihood functions of exponential families of distributions, and they need to be minimized during maximum likelihood estimation.
Other examples
Norms. Each component of an affine function: \((Ax + b)_i\).
Convex quadratic functionals
Like this: \(f(x) = x^{T}Ax + hx + c\), with \(A \succeq 0\). Many level sets are ellipsoids.
Other Functional-classes defined using convexity
Concave functionals
Concave function: -f is convex, domain is still convex.
Any affine linear function y = ax+b is both concave and convex.
f(x, y) = xy not simultaneously convex in \(x\) and y: have a saddle point graph; but individually convex in \(x\) and y.
Linear functionals
These are both convex and concave.
Quasi-convex functionals
Convex sublevel sets
Any function with convex domain, convex sub-level sets. Eg: \(|x|\). Similarly, quasi-concave, quasi-linear functionals are defined.
So, can replace each sublevel set with a sublevelset of a convex functionals. This is used in quasi-convex programming.
Smoothness along line segment in the domain
\(\forall t \in [0, 1]: f(tx + (1-t)y) \leq \max\set{f(x), f(y)}\): else \(\max\set{f(x), f(y)}\) sublevel-set of \(f\) would not be convex. So, there can’t be local hills or craters, but there can be plateaus on the way down to any of the global minima.
First order condition
f quasiconvex iff \(f(y) \leq f(x) \implies \gradient f(x)^{T}(y-x) \leq 0\) : otherwise, could take a small step along the segment connecting \(x\) and \(y\), and value of \(f\) would be greater than f(x). Visualize with contour maps in case of 2D functional.
Importance
Sublevel sets of quasi-convex functions, being convex, can be used to specify feasible region of convex optimization program.
Log concave functional f
log \(f\) is concave.
Importance
Many probability distributions log concave.
Properties
Directly from concavity of log f: \(f(tx + (1-t)y) \geq f(x)^{t}f(y)^{1-t} \), if \(f\) differentiable: \(f(x)\gradient^{2}f(x) \preceq \gradient f(x) \gradient f(x)^{T}\).
\(g(x) = \int f(x, y)dy\) is also log concave, but sum of log concave functionals not necessarily log concave.
Convolution of \(f\) also log concave.