4.57 Capacity of a communication channel. We consider a communication channel, with input X(t) ∈ {1, . . . , n}, and output Y(t) ∈ {1, . . . , m}, for t = 1, 2, . . . (in seconds, say). The relation between the input and the output is given statistically:
pij = prob(Y(t) = i|X(t) = j), i = 1, . . . , m, j = 1, . . . , n.
The matrix P ∈ R^m!n is called the channel transition matrix, and the channel is called a discrete memoryless channel.
A famous result of Shannon states that information can be sent over the communication channel, with arbitrarily small probability of error, at any rate less than a number C, called the channel capacity, in bits per second. Shannon also showed that the capacity of a discrete memoryless channel can be found by solving an optimization problem. Assume that X has a probability distribution denoted x ∈ R^n, i.e.,
xj = prob(X = j), j = 1, . . . , n.
The mutual information between X and Y is given by
I(X; Y) = ∑_{i=1}^m ∑_{j=1}^n xj pij log2(pij / ∑_{k=1}^n xk pik).
Then the channel capacity C is given by
C = sup_x I(X; Y),
where the supremum is over all possible probability distributions for the input X, i.e., over x ≵ 0, 1^T x = 1.
Show how the channel capacity can be computed using convex optimization.
Hint. Introduce the variable y = Px, which gives the probability distribution of the output Y, and show that the mutual information can be expressed as
I(X; Y) = c^T x − ∑_{i=1}^m yi log2 yi,
where cj = ∑_{i=1}^m pij log2 pij, j = 1, . . . , n.
Solution. The capacity is the optimal value of the problem
maximize f0(x) = ∑_{i=1}^m ∑_{j=1}^n xj pij log(pij / ∑_{k=1}^n xk pik)
subject to x ≵ 0, 1^T x = 1,
with variable x. It is possible to argue directly that the objective f0 (which is the mutual information between X and Y) is concave in x. This can be done several ways, starting from the example 3.19.
Another (related) approach is to follow the hint given, and introduce y = Px as another variable. We can express the mutual information in terms of x and y as
I(X; Y) = ∑_{i,j} xj pij log(pij / ∑_k xk pik)
= ∑_j xj ∑_i pij log pij − ∑_i yi log yi
= −c^T x − ∑_i yi log yi,
where cj = −∑_i pij log pij. Therefore the channel capacity problem can be expressed as
maximize I(X; Y) = −c^T x − ∑_i yi log yi
subject to x ≵ 0, 1^T x = 1
y = Px,
with variables x and y. The objective is a constant plus the entropy of y, hence concave, so this is a convex optimization problem.