Self-Information
In information theory, the entropy of a random variable is the average level of "information",
"surprise", or "uncertainty" inherent to the variable's possible outcomes.
The self-information is a measure of the information content associated with the outcome of a
random variable. The self-information of an event X = x is defined as:
$$I(x) = -log_2 P(X = x)$$
The choice of base for log, the logarithm, varies for different applications. Base 2 gives the unit of
bits. We can quantify the amount of uncertainty in an entire probability distribution using the
Shannon entropy.
Shannon Entropy
Given a discrete random variable X, with possible outcomes x1,..., xn, which occur with probability
P(X = x₁),..., P(X = x) the entropy of X is formally defined as:
$$H(X) = -\sum_{i=1}^n P(X = x_i) log_2 P(X = x_i)$$
where ∑ denotes the sum over the variable's possible values. An equivalent definition of entropy is
the expected value of self-information of a variable.
➤ Problem:
Study Shannon Entropy yourself in more detail and calculate the entropy of two random
variables X and Y, respectively.
1) Random variable X is a uniform random variable with N = 8,i.e., X~U(8). (25 points)
2) Random variable Y has the following probability mass function (25 points):
$$P(X = 1) = \frac{1}{2}, P(X = 2) = \frac{1}{4}, P(X = 3) = \frac{1}{8}, P(X = 4) = \frac{1}{16},$$
$$P(X = 5) = \frac{1}{64}, P(X = 6) = \frac{1}{64}, P(X = 7) = \frac{1}{64}, P(X = 8) = \frac{1}{64}.$$
3) Which random variable gives a higher entropy, X or Y? (25 points)
4) Answer why entropy is maximized in a uniform distribution. (25 points)