03 Averaging using the pdf

Consider the real valued random variable \(X: (S, B) \to (R, B_r, m)\), whose pdf is \(f_X\) defined relative to the reference measure \(m\).

Mean/ Expectation of real valued RV

Aka Expected value. \(E:\set{RV} \to R\). \(E[X] = \mean = \int_{X} x f_X(x) dm = E_{X}[X]\).

This is the weighted average of \(range(X)\). \(E[X]\) is actually a convex combination of points in range(X).

Subscript notation

See probability section.

Conditional Expectation

Conditional expectation of X wrt event A: \(E_{X}[X|A]\) is computed using the conditional pdf \(f_{X|A}(x)\). Sometimes, this is considered as a function of variable \(A\).

For events with non-0 probability measure, this is \(= \frac{E[X I_A(X)]}{Pr(A)}\).

Expectation: Properties

\(E_Y[E_{X}[X|Y]]= E_Y[f(Y)] = E_{X}[X]\).

Connection to probability measure

\(Pr(B) = E_{X}[Pr(B|X)]\).

If \(X\) is an indicator RV, E[X] = Pr(X = 1).

Products of independent RV

If expectations are finite: \(I(X,Y) \equiv E_{X, Y}[XY]=E_{X}[X]E_Y[Y]\) as \(I(X,Y) \equiv f_{X, Y}(x, y) = f_X(x)f_Y(y)\).

Linearity properties

\(E[k]=k\).

Linearity in X

This follows from the linearity of integration.\ \(E[\sum_{i} X_{i}] = \int_{X} (\sum_{i} x_{i}) f_X(x) dm = \sum_{i} \int_{X} x_{i} f_X(x)dm\ = \sum_{i} \int_{X_i} x_{i} f_{X_i}(x)dm = \sum_{i}E[X_{i}]\)

Expectation is linear, even if the summed RVs are dependent.

\exclaim{E[X] is convex!}

Importance

\exclaim{Powerful, unintuitive!} 10 folks go to a ghoShTi, leave their hats; retrieve some random hat after the ghoShTi. What is the expected number of people to retrieve the hat they came with? Linearity greatly simplifies calculation.

Linearity in pdf

Consider \(E[X]\), where \(f_X(x)\) is a convex combination of \((f_i(x))\) with coefficients \((a_i)\). Because of the linearity of integration, \(E[X]\) is linear in the pdf components: \(E[X] = \sum_i a_i E_{f_i}[X]\).

Convex function of E[] inequality

Aka Jensen’s inequality.

If f is convex, \(E[f(X)] \geq f(E[X])\):

Proof

E[X] is actually a convex combination of points in range(X). So, follows directly from definition of convexity (see vector spaces ref).

So: \(E[X^{2}]-(E[X])^{2} \geq 0\).

Expectation: Analysis tricks

If \(X\) and Y are not independent, fix the factor in \(X\) and Y which causes the dependence, and ye may have independence.

The fact that \(\max X \geq E[X]\) is useful in getting lower bounds on some quantities.

Variance from mean

\(Var[X] = \stddev^{2} = E[X-E[X]]^{2} = E[X^{2}]-(E[X])^{2}\). Weighted avg of square deviation from mean of function over points in sample space.

Concavity in p

Consider discrete distributions; let x be a vector of values \(X\) takes. \(var[X] = p^{T}x - x^{T}pp^{T}x\).

Other properties

\(Var[\sum a_{i}X_{i}] = \sum a_{i}^{2}Var[X_{i}]+2\sum_{i}\sum_{j}a_{i}a_{j} Cov[X_{i},X_{j}]\): using linearity of expectations on \(E[(\sum_{i} (a_{i}X_{i} - a_{i}\mean_{i}))^{2}]\).

Translation invariance

var[X + c] = var[X].

Moments of RV X

kth moment

\(E[X^{k}]\) is the kth Moment of X. Empirical kth moment is \(\frac{\sum_{i=1}^{n} X_i^{k}}{n}\)

kth moment about the mean

kth Moment of \(X\) about 0 : \(E[X^{k}]\) vs moment of \(X\) about \(\mean\) aka central moment: \(E[(X - \mean)^{k}]\).

Normalized moments: \(\frac{E[(X - \mean)^{k}]}{\stddev^{k}}\).

Important moments

Central moment immune to translation; describes shape of pdf. 2nd central moment, aka variance, measures fatness.

Skewness: \(\gamma = \frac{E[X - \mean]^{3}}{\stddev^{3}}\): putting more weight on farther points; pdf has left skew if \(\gamma<0\).

Kurtosis: measure tallness/ leanness vs shortness/ squatness: \(\frac{E[X - \mean]^{4}}{\stddev^{4}} - 3\): 3 term ensures that Normal distribution has Kurtosis 0.

Can easily determine statistics corresponding to these parameters from a sample.

The moments of \(X\) completely describe pdf of \(X\) \why. \(N(\mean, \stddev^{2})\) has only 2 moments \chk.

Moment generating function

\(M_{X}(t)=E[e^{tX}]\). \ \(\frac{dE[f(x,t)]}{dt} = \frac{d \int f(x, t) dx}{dt} = lt_{\del\int (f(x, t+\del t) - f(x, t))dx \to 0}\frac{}{\del t} = E[\frac{df(x,t)}{dt}]\). So, use to find nth moment of \(X\) about 0, \(E[X^{n}]=d^{n}M_{X}(t)/(dt)^{n}|_{t=0}\): can also use Taylor series for \(e^{tX}\) with linearity of expectations.

It is unique: If \(M_{X}(t) = M_{Y}(t)\), \(X\) and Y have same distribution: as it generates all possible moments of \(X\) and Y identically. \why

\(M_{\sum X_{i}} = E[e^{t \sum X_{i}}] = \prod M_{X_{i}}\) if \(\set{X_{i}} \perp\).

Characteristic function of X

\(f_{X} = E_{X}[e^{itX}]\). Useful as sometimes, the moment generating function is not well defined.

For Poisson trials

\(M_{X_{i}}(t)=1+p_{i}(e^{t}-1) \leq e^{p_{i}(e^{t}-1)}\). So, if \(X = \sum X_{i}, \mean = \sum p_{i}; M_{X} \leq e^{\mean(e^{t}-1)}\).