The Shannon-Hartley theorem describes the maximum rate at which information can be sent over a bandwidth-limited AWGN channel. This rate is called the channel capacity. If \(B\) is the bandwidth of the channel in Hz, \(S\) is the signal power (in units of W), and \(N_0\) is the noise power spectral density (in units of W/Hz), then the channel capacity \(C\) in units of bits per second is\[C = B \log_2\left(1 + \frac{S}{N_0B}\right).\]

Let us now consider that we make \(n\) “sub-channels”, by selecting \(n\) disjoint bandwidth intervals contained in the total bandwidth of the channel. We denote the bandwidth of these sub-channels by \(B_j\), \(j = 1,\ldots,n\). Clearly, we have the constraint \(\sum_{j=1}^n B_j \leq B\). Likewise, we divide our total transmit power \(S\) into the \(n\) sub-channels, allocating power \(S_j\) to the signal in the sub-channel \(j\). We have \(\sum_{j=1}^n S_j = S\). Under these conditions, each sub-channel will have capacity \(C_j\), given by the formula above with \(B_j\) and \(S_j\) in place of \(B\) and \(S\).

The natural question regards using the \(n\) sub-channels in parallel to transmit data: what is the maximum of the sum \(\sum_{j=1}^n C_j\) under these conditions and how can it be achieved? It is probably clear from the definition of channel capacity that this sum is always smaller or equal than \(C\). After all, by dividing the channel into sub-channels we cannot do any better than by considering it as a whole.

People used to communications theory might find intuitive that we can achieve \(\sum_{j=1}^n C_j = C\), and that this happens if and only if we use all the bandwidth (\(\sum_{j=1}^n B_j = B\)) and the SNRs of the sub-channels, defined by \(S_j/(N_0B_j)\), are all equal, so that \(S_j = SB_j/B\). After all, this is pretty much how OFDM and other channelized communication methods work. In this post I give an easy proof of this result.

First of all, since the expression \(B_j \log_2(1+S_j/(N_0B_j))\) is increasing in \(B_j\), if \(\sum_{j=1}^nB_j < B\), then we can increase some of the values of the \(B_j\) until we have the equality, increasing as a consequence the sum \(\sum_{j=1}^n C_j\). Therefore, we can restrict ourselves to the case when \(\sum_{j=1}^nB_j = B\).

Now we consider that \(B_j\) are fixed and apply Langrange multipliers to find the extrema of the function\[f(S_1,\ldots,S_n) = \sum_{j=1}^n B_j \log\left(1 + \frac{S_j}{N_0B_j}\right)\]subject to the condition \(\sum_{j=1}^n S_j = S\). This gives the system of equations\[B_j\left(1+\frac{S_j}{N_0B_j}\right)^{-1}\frac{1}{N_0B_j} + \lambda = 0,\]where \(\lambda\) is the Langrange multiplier. These equations can be rearranged as\[\frac{S_j}{B_j} = -N_0 – \frac{1}{\lambda}.\] It follows that the quotients \(S_j/B_j\) are all equal.

Now we write \(S_j/B_j = K\). We have\[S = \sum_{j=1}^n S_j = \sum_{j=1}^n K B_j = KB.\]Therefore, \(S_j/B_j = S/B\). This implies\[\begin{split}\sum_{j=1}^n C_j &= \sum_{j=1} B_j \log_2\left(1 + \frac{S_j}{N_0B_j}\right) = \sum_{j=1} B_j \log_2\left(1 + \frac{S}{N_0B}\right) \\&= B \log_2\left(1 + \frac{S}{N_0B}\right) = C.\end{split}\]This is what we wanted to show.

The attentive reader will perhaps have noticed that we also have the restrictions \(S_j \geq 0\), so we need to be a bit careful when applying Lagrange multipliers. However, this is not a problem. If the maximum was found at a point in which some of the \(S_j\) are zero, we can remove these variables (their corresponding capacities \(C_j\) are zero) and consider a problem with a smaller \(n\) in which all the \(S_j\) are strictly greater than zero.