02 Probability of events

Example applications, modeling intricacies, various interpretations of probability are discussed in the probabilistic modeling survey.

Probability measure

The following axiomatization is common to both frequentist and subjective interpretation of probability.

Take sample space $S$ , sigma algebra $F$ . The probability measure is a special measure $v : F \to [0, 1]$ ; so operates over sets, like CDF. Additionally, the general additivity property (described in algebra survey) is usually assumed.

Other properties of the measure: should be countably additive over disjoint sets: $σ$ additive; $v (\gO) = 1$ .

So, $v$ specifies event probability. $(S, F, v)$ is called a probability space.

Importance

\exclaim{Viewing probability as the measure of a set of events makes many notions [even the simple Union bound] much more intuitive!}

Visualization: An area of atomic events

Use a spotted 2-d compact set, whose area represents the randomness (cross product of all distributions) involved in the probability; the spots represent an event of interest.

Subscript notation

Consider the subscript in $P r_{X \in_{D} S} (E)$ . In this case, the subscript indicates the variables which need to be specified in order to specify a point in the sample space; in doing so, it tells us where the randomness lies. It also tells us something about the range $S$ of the random variable $X$ , and its distribution $D$ .

Other notations

Also often used: $P r_{X} (E), P r_{D} (E)$ , or $P r_{t} (E)$ \ if the distribution $D$ is parametrized by $t$ .

Importance

This notation/ representation is very valuable; using it, we can clearly manipulate and reason about probability quantities, seeing for example how the sample space shrinks as we consider condition probability distributions.

Empirical measure

$v_{n} (E) = n^{- 1} \sum_{i} I_{E} (x_{i})$ , where I is the indicator function. This is useful in deducing the actual measure $v$ using experiments/ sampling.

Conditional and unconditional probabilities

Conditional (posterior) probability $P r (E_{1} | E_{2})$ considers the evidence that $E_{1}$ has occurred. This conditioning alters the measure, so that $P r (E_{1} | E_{1}) = 1$ , and $\frac{P r (E_{1}, E_{2})}{P r (E_{1})} = P r (E_{2} | E_{1})$ (aka product rule). So, the sample space $S$ can be thought of as now being constricted to $E_{1}$ .

The unconditioned measure $P r (E_{1})$ is called prior (marginal) probability.

Common errors

Equal weight error

Instead of calculating $\frac{P r (E_{1}, E_{2})}{P r (E_{1})}$ , one common error is to use unweighted counts: $You can't use 'macro parameter character #' in math mode$ , which leads to the wrong result. Hence, this should be avoided and proper formalism used.

See examples provided later.

Misidentified prior error

Another common problem is the misidentification of the prior event with another, which leads to a different weight being assigned to the probabilities involved. This can lead to the equal weight error.

See examples provided later.

Illustrations

\example{Warden problem. Of 3 prisoners $(P 1, P 2, P 3)$ scheduled to be executed, one is pardoned. The identity of the spared prisoner is known only to the warden. $P 1$ tries to find out about his fate.

On being pressed, the warden, reasoning that he is not leaking any information relevant to $P 1$ , only says that $P 2$ is executed. But, $P 1$ is now happy that the probability of his being pardoned is increased from $1 / 3$ to $1 / 2$ .

The warden is correct and $P 1$ is wrong. Reason follows.

Let $P i$ also represent event where $P i$ is pardoned. Let $W$ represent the event where warden tells $P 1$ that $P 2$ is being executed. Now, $P r (P 1) = 1 / 3$ . We want to find $P r (P 1 | W)$ . $P r (W) = \sum_{i} P r (W \cap P i) = 1 / 6 + 0 + 1 / 3 = 1 / 2$ , and $P r (P 1 \cap W) = 1 / 6$ , so $P r (P 1 | W) = 1 / 3$ . So, $P 1$ has learned nothing about his fate.

Another source of error in this example is confusing $W$ with event $P 2$ . }

\example{Monty Hall problem. In a game show conducted by Monty Hall, there are 3 doors $(P 1, P 2, P 3)$ , one of which has a reward. Only Monty Hall knows where. A player chooses a door, say $P 1$ . Monty Hall opens one of the other doors, say $P 2$ , and reveals it to contain no reward. Should the player switch to $P 3$ ?

Same rigorous reasoning as in the case of Warden problem can be applied to reveal that he should switch. The source of errors are also the same. }

Independence of events

$E_{1}$ and $E_{2}$ are independent if the $P r (E_{2}) = P r (E_{2} | E_{1})$ : so, the evidence $E_{1}$ does not change the probability measure as applied to $E_{2}$ . This is same as saying: $P r (E_{1} \land E_{2}) = P r (E_{1}) P r (E_{2})$ .

Properties of the measure

Important properties such as the inclusion/ exclusion principle, union and intersection measure bounds follow from those described for general measures.

Connection with expectation

Consider the measure $v$ . $P r (E_{i}) = E_{v} [I_{E_{i}} (x)]$ .

Probability with Multiple variables/ sigma algebras

Consider the product $(S_{1} \times S_{2}, F_{1} \times F_{2}, v)$ of the probability spaces $(S_{i}, F_{i}, v_{i})$ for $i \in 1, 2$ . Consider the specific event $E \in F_{1}$ .

The product measure

The resulting product measure $v$ always obeys the following constraints:

$\forall E \in F_{1} : v (E, S_{2}) = v_{1} (E)$ . A symmetrical condition holds for all $G \in F_{2}$ .

Marginalization

Aka Law of total probability, marginalization. $P r (E) = \sum_{G \in F_{2}} P r (E \land G)$ : like adding up rows in a table of joint probabilities.

Conditional probability inversion

Aka Bayes’ theorem, Bayes’s rule.

$P r (G | E) = \frac{P r (E | G) P r (G)}{P r (E)} = \frac{P r (E | G) P r (G)}{\sum_{G \in F_{2}} P r (E | G) P r (G)}$ .

Fixing $E$ , this becomes a function $F_{2} \to [0, 1]$ , we can write $P r (G | E) =\propto P r (E | G) P r (G)$ .

Likelihood function

What is the likelihood of a hypothesis $E$ given the evidence $G$ ? We can use $f_{G} (E) = P r (G | E)$ , a function of $E$ alone, as a measure of this.

For use in statistical inference, see statistics ref.

Associated quantities

Odds and log odds

Odds: $P r (E_{1}) / P r (\neg E_{1})$ . Thence is defined log odds or logit. In logistic models, this is modeled, rather than the probability itself.