Self Information of an event
Aka surprisal. Measure of information content associated with event e: rarer the event, more the info, and in case of independence
As code-length for recording event
Coding problem
Suppose that we wanted to record information that an event occurred, but we wanted to use as few bits in expectation as possible. We want to satisfy this: the more common the event, fewer the bits one would need to transmit the event’s occurrence.
Coding algorithm
We observe that there can be at most
This is a code with the least expected code-length, as shown in the entropy section.
Unit
Inspired by the code-length interpretation of surprisal. Depending on whether
Entropy of an RV X
Definition
Desired properties
Uncertainty associated with an RV: Should not change if probability rearranged for different values of
As expected surprisal
Extension to 0 values
Extend definition for
Expected Information/ code-length
Entropy of
It is the least expected number of bits required to transmit the value of the random process. \proof: Non negativity of Information divergence.
Cross entropy
Even though
As cross entropy relative to U
Non uniform distribution has less entropy than uniform distribution. Can use this to reduce the number of bits needed to transmit information.
Concavity in case of discrete distribution p
Asymptotic equipartition property (AEP)
Take binary distribution with entropy H, iid sample
Joint and cross entropy
Joint entropy
Additivity, as requried: If
Cross entropy
Conditional entropy of X given Y
Mutual information of X wrt Y
This is the expected value of the information gain / code-length divergence:
As deviation from independent distribution
Conditional Mutual information wrt Z
Other information metrics
Hamming weight of x: wt(x). Hamming distance:
Communication complexity
The problem
A talks to B; A knows a; B knows b; want to find f(a, b) with min communication and even
Easy solution is to send a and b. But these may be large. So want to use some protocol depending on f.
Applications
VLSI, scenarios where communication is very costly.
The communication protocol tree
So, can look at all possible communication sequences using a protocol tree.
Deterministic vs randomized protocols
Bits transmitted by deterministic protocol, for worst possible (a, b)
Randomized protocols may use public randomness or private random bits. Bits used by them for worst (a, b)
Having public random bits is not much more powerful: you can replace public random bit using protocol with private random bit using protocol with only
Computing f for k input pairs
Want to do better than
Examples
Checking equality
A uses rand r, sends fingerprint (F(a, r), r) to B.
To show that F is good: Make