channel capacity

{{Short description|Information-theoretical limit on transmission rate in a communication channel}}

{{More citations needed|date=May 2023}}

{{Information theory}}

Channel capacity, in electrical engineering, computer science, and information theory, is the theoretical maximum rate at which information can be reliably transmitted over a communication channel.

Following the terms of the noisy-channel coding theorem, the channel capacity of a given channel is the highest information rate (in units of information per unit time) that can be achieved with arbitrarily small error probability.{{cite web |url=http://www.cs.ucl.ac.uk/staff/S.Bhatti/D51-notes/node31.html |author=Saleem Bhatti |title=Channel capacity |work=Lecture notes for M.Sc. Data Communication Networks and Distributed Systems D51 -- Basic Communications and Networks |url-status=dead |archive-url=https://web.archive.org/web/20070821212637/http://www.cs.ucl.ac.uk/staff/S.Bhatti/D51-notes/node31.html |archive-date=2007-08-21 }}{{cite web | url = http://www.st-andrews.ac.uk/~www_pa/Scots_Guide/iandm/part8/page1.html | title = Signals look like noise! | author = Jim Lesurf | work = Information and Measurement, 2nd ed.}}

Information theory, developed by Claude E. Shannon in 1948, defines the notion of channel capacity and provides a mathematical model by which it may be computed. The key result states that the capacity of the channel, as defined above, is given by the maximum of the mutual information between the input and output of the channel, where the maximization is with respect to the input distribution.{{cite book| author = Thomas M. Cover, Joy A. Thomas | title = Elements of Information Theory | publisher = John Wiley & Sons, New York |year=2006| isbn = 9781118585771 |url=https://books.google.com/books?id=VWq5GG6ycxMC&q=%22channel+capacity%22}}

The notion of channel capacity has been central to the development of modern wireline and wireless communication systems, with the advent of novel error correction coding mechanisms that have resulted in achieving performance very close to the limits promised by channel capacity.

Formal definition

The basic mathematical model for a communication system is the following:

:\xrightarrow[\text{Message}]{W}

\begin{array}

c
\hline \text{Encoder} \\ f_n \\ \hline\end{array}

\xrightarrow[\mathrm{Encoded \atop sequence}]{X^n}

\begin{array}

c
\hline \text{Channel} \\ p(y|x) \\ \hline\end{array}

\xrightarrow[\mathrm{Received \atop sequence}]{Y^n}

\begin{array}

c
\hline \text{Decoder} \\ g_n \\ \hline\end{array}

\xrightarrow[\mathrm{Estimated \atop message}]{\hat W}

where:

  • W is the message to be transmitted;
  • X is the channel input symbol (X^n is a sequence of n symbols) taken in an alphabet \mathcal{X};
  • Y is the channel output symbol (Y^n is a sequence of n symbols) taken in an alphabet \mathcal{Y};
  • \hat{W} is the estimate of the transmitted message;
  • f_n is the encoding function for a block of length n;
  • p(y|x) = p_{Y|X}(y|x) is the noisy channel, which is modeled by a conditional probability distribution; and,
  • g_n is the decoding function for a block of length n.

Let X and Y be modeled as random variables. Furthermore, let p_{Y|X}(y|x) be the conditional probability distribution function of Y given X, which is an inherent fixed property of the communication channel. Then the choice of the marginal distribution p_X(x) completely determines the joint distribution p_{X,Y}(x,y) due to the identity

:\ p_{X,Y}(x,y)=p_{Y|X}(y|x)\,p_X(x)

which, in turn, induces a mutual information I(X;Y). The channel capacity is defined as

:\ C = \sup_{p_X(x)} I(X;Y)\,

where the supremum is taken over all possible choices of p_X(x).

Additivity of channel capacity

Channel capacity is additive over independent channels.{{cite book |last1=Cover |first1=Thomas M. |last2=Thomas |first2=Joy A. |title=Elements of Information Theory |publisher=Wiley-Interscience |edition=Second |date=2006 |pages=206–207 |chapter=Chapter 7: Channel Capacity |isbn=978-0-471-24195-9}} It means that using two independent channels in a combined manner provides the same theoretical capacity as using them independently.

More formally, let p_{1} and p_{2} be two independent channels modelled as above; p_{1} having an input alphabet \mathcal{X}_{1} and an output alphabet \mathcal{Y}_{1}. Idem for p_{2}.

We define the product channel p_{1}\times p_2 as

\forall (x_{1}, x_{2}) \in (\mathcal{X}_{1}, \mathcal{X}_{2}),\;(y_{1}, y_{2}) \in (\mathcal{Y}_{1}, \mathcal{Y}_{2}),\; (p_{1}\times p_{2})((y_{1}, y_{2}) | (x_{1},x_{2}))=p_{1}(y_{1}|x_{1})p_{2}(y_{2}|x_{2})

This theorem states:

C(p_{1}\times p_{2}) = C(p_{1}) + C(p_{2})

{{Proof|

We first show that C(p_{1}\times p_{2}) \geq C(p_{1}) + C(p_{2}) .

Let X_1 and X_2 be two independent random variables. Let Y_1 be a random variable corresponding to the output of X_1 through the channel p_{1}, and Y_2 for X_2 through p_2.

By definition C(p_{1}\times p_{2}) = \sup_{p_{X_{1},X_{2}}}(I(X_{1},X_{2} : Y_{1},Y_{2})).

Since X_1 and X_2 are independent, as well as p_1 and p_2, (X_1,Y_1) is independent of (X_2,Y_2). We can apply the following property of mutual information: I(X_1,X_2 : Y_1, Y_2) = I(X_1:Y_1) + I(X_2:Y_2)

For now we only need to find a distribution p_{X_1,X_2} such that I(X_1,X_2 : Y_1,Y_2) \geq I(X_1 : Y_1) + I(X_2 : Y_2). In fact, \pi_1 and \pi_2, two probability distributions for X_1 and X_2 achieving C(p_1) and C(p_2), suffice:

:C(p_{1}\times p_{2}) \geq I(X_1, X_2 : Y_1, Y_2) = I(X_1:Y_1) + I(X_2:Y_2) = C(p_1) + C(p_2)

ie. C(p_{1}\times p_{2}) \geq C(p_1) + C(p_2)

Now let us show that C(p_{1}\times p_{2}) \leq C(p_{1}) + C(p_{2}) .

Let \pi_{12} be some distribution for the channel p_{1}\times p_{2} defining (X_1, X_2) and the corresponding output (Y_1, Y_2). Let \mathcal{X}_1 be the alphabet of X_1, \mathcal{Y}_1 for Y_1, and analogously \mathcal{X}_2 and \mathcal{Y}_2.

By definition of mutual information, we have

\begin{align}

I(X_1, X_2 : Y_1, Y_2) &= H(Y_1, Y_2) - H(Y_1, Y_2 | X_1, X_2)\\

&\leq H(Y_1) + H(Y_2) - H(Y_1, Y_2 | X_1, X_2)

\end{align}

Let us rewrite the last term of entropy.

H(Y_1,Y_2|X_1,X_2) = \sum_{(x_1, x_2) \in \mathcal{X}_1\times \mathcal{X}_2}\mathbb{P}(X_{1}, X_{2} = x_{1}, x_{2})H(Y_{1}, Y_{2} | X_{1}, X_{2} = x_{1}, x_{2})

By definition of the product channel, \mathbb{P}(Y_{1},Y_{2}=y_{1},y_{2}|X_{1},X_{2}=x_{1},x_{2})=\mathbb{P}(Y_{1}=y_{1}|X_{1}=x_{1})\mathbb{P}(Y_{2}=y_{2}|X_{2}=x_{2}).

For a given pair (x_1, x_2), we can rewrite H(Y_1,Y_2|X_1,X_2=x_1,x_2) as:

\begin{align}

H(Y_1, Y_2 | X_1, X_2 = x_1,x_2) &= \sum_{(y_1, y_2) \in \mathcal{Y}_1\times \mathcal{Y}_2}\mathbb{P}(Y_1, Y_2 = y_1, y_2 | X_1, X_2 = x_1, x_2)\log(\mathbb{P}(Y_1, Y_2 = y_1, y_2 | X_1, X_2 = x_1, x_2)) \\

&= \sum_{(y_1, y_2) \in \mathcal{Y}_1\times \mathcal{Y}_2}\mathbb{P}(Y_1, Y_2 = y_1, y_2 | X_1, X_2 = x_1, x_2)[\log(\mathbb{P}(Y_1 = y_1 | X_1 = x_1)) + \log(\mathbb{P}(Y_2 = y_2 | X_2 = x_2))] \\

&=H(Y_{1}|X_{1}=x_1)+H(Y_{2}|X_{2}=x_2)

\end{align}

By summing this equality over all (x_1, x_2), we obtain

H(Y_1,Y_2|X_1,X_2)=H(Y_1|X_1)+H(Y_2|X_2).

We can now give an upper bound over mutual information:

\begin{align}

I(X_{1},X_{2}:Y_{1},Y_{2})&\leq H(Y_{1})+H(Y_{2})-H(Y_{1}|X_{1})-H(Y_{2}|X_{2})\\

&=I(X_{1}:Y_{1})+I(X_{2}:Y_{2})

\end{align}

This relation is preserved at the supremum. Therefore

:C(p_{1}\times p_{2}) \leq C(p_1)+C(p_2)

Combining the two inequalities we proved, we obtain the result of the theorem:

:C(p_{1}\times p_{2})=C(p_{1})+C(p_{2})

}}

Shannon capacity of a graph

{{main|Shannon capacity of a graph}}

If G is an undirected graph, it can be used to define a communications channel in which the symbols are the graph vertices, and two codewords may be confused with each other if their symbols in each position are equal or adjacent. The computational complexity of finding the Shannon capacity of such a channel remains open, but it can be upper bounded by another important graph invariant, the Lovász number.{{citation | first = László | last = Lovász | author-link = László Lovász | title = On the Shannon Capacity of a Graph | journal = IEEE Transactions on Information Theory | volume = IT-25 | issue = 1 | year = 1979 | pages = 1–7 | doi = 10.1109/tit.1979.1055985 }}.

Noisy-channel coding theorem

The noisy-channel coding theorem states that for any error probability ε > 0 and for any transmission rate R less than the channel capacity C, there is an encoding and decoding scheme transmitting data at rate R whose error probability is less than ε, for a sufficiently large block length. Also, for any rate greater than the channel capacity, the probability of error at the receiver goes to 0.5 as the block length goes to infinity.

Example application

An application of the channel capacity concept to an additive white Gaussian noise (AWGN) channel with B Hz bandwidth and signal-to-noise ratio S/N is the Shannon–Hartley theorem:

: C = B \log_2 \left( 1+\frac{S}{N} \right)\

C is measured in bits per second if the logarithm is taken in base 2, or nats per second if the natural logarithm is used, assuming B is in hertz; the signal and noise powers S and N are expressed in a linear power unit (like watts or volts2). Since S/N figures are often cited in dB, a conversion may be needed. For example, a signal-to-noise ratio of 30 dB corresponds to a linear power ratio of 10^{30/10} = 10^3 = 1000.

Channel capacity estimation

To determine the channel capacity, it is necessary to find the capacity-achieving distribution p_X(x) and evaluate the mutual information I(X;Y). Research has mostly focused on studying additive noise channels under certain power constraints and noise distributions, as analytical methods are not feasible in the majority of other scenarios. Hence, alternative approaches such as, investigation on the input support,{{Cite journal |last=Smith |first=Joel G. |date=1971 |title=The information capacity of amplitude- and variance-constrained sclar gaussian channels |url=https://linkinghub.elsevier.com/retrieve/pii/S0019995871903469 |journal=Information and Control |language=en |volume=18 |issue=3 |pages=203–219 |doi=10.1016/S0019-9958(71)90346-9}} relaxations{{Cite journal |last1=Huang |first1=J. |last2=Meyn |first2=S.P. |date=2005 |title=Characterization and Computation of Optimal Distributions for Channel Coding |url=https://ieeexplore.ieee.org/document/1459046 |journal=IEEE Transactions on Information Theory |language=en |volume=51 |issue=7 |pages=2336–2351 |doi=10.1109/TIT.2005.850108 |s2cid=2560689 |issn=0018-9448}} and capacity bounds,{{Cite book |last=McKellips |first=A.L. |chapter=Simple tight bounds on capacity for the peak-limited discrete-time channel |date=2004 |title=International Symposium onInformation Theory, 2004. ISIT 2004. Proceedings. |chapter-url=https://ieeexplore.ieee.org/document/1365385 |publisher=IEEE |pages=348 |doi=10.1109/ISIT.2004.1365385 |isbn=978-0-7803-8280-0|s2cid=41462226 }} have been proposed in the literature.

The capacity of a discrete memoryless channel can be computed using the Blahut-Arimoto algorithm.

Deep learning can be used to estimate the channel capacity. In fact, the channel capacity and the capacity-achieving distribution of any discrete-time continuous memoryless vector channel can be obtained using CORTICAL,{{Cite journal |last1=Letizia |first1=Nunzio A. |last2=Tonello |first2=Andrea M. |last3=Poor |first3=H. Vincent |date=2023 |title=Cooperative Channel Capacity Learning |url=https://ieeexplore.ieee.org/document/10143184 |journal=IEEE Communications Letters |volume=27 |issue=8 |pages=1984–1988 |doi=10.1109/LCOMM.2023.3282307 |issn=1089-7798|arxiv=2305.13493 }} a cooperative framework inspired by generative adversarial networks. CORTICAL consists of two cooperative networks: a generator with the objective of learning to sample from the capacity-achieving input distribution, and a discriminator with the objective to learn to distinguish between paired and unpaired channel input-output samples and estimates I(X;Y).

Channel capacity in wireless communications

This section{{citation | author = David Tse, Pramod Viswanath | title = Fundamentals of Wireless Communication | publisher = Cambridge University Press, UK | year=2005| isbn = 9780521845274 |url=https://books.google.com/books?id=66XBb5tZX6EC&q=%22Channel+capacity%22}} focuses on the single-antenna, point-to-point scenario. For channel capacity in systems with multiple antennas, see the article on MIMO.

=Bandlimited AWGN channel=

{{main|Shannon–Hartley theorem}}

File:Channel Capacity with Power- and Bandwidth-Limited Regimes.png

If the average received power is \bar{P} [W], the total bandwidth is W in Hertz, and the noise power spectral density is N_0 [W/Hz], the AWGN channel capacity is

:C_{\text{AWGN}}=W\log_2\left(1+\frac{\bar{P}}{N_0 W}\right) [bits/s],

where \frac{\bar{P}}{N_0 W} is the received signal-to-noise ratio (SNR). This result is known as the Shannon–Hartley theorem.{{cite book|title=The Handbook of Electrical Engineering|year=1996|publisher=Research & Education Association|isbn=9780878919819|page=D-149|url=https://books.google.com/books?id=-WJS3VnvomIC&q=%22Shannon%E2%80%93Hartley+theorem%22&pg=RA1-SL4-PA41}}

When the SNR is large (SNR ≫ 0 dB), the capacity C\approx W\log_2 \frac{\bar{P}}{N_0 W} is logarithmic in power and approximately linear in bandwidth. This is called the bandwidth-limited regime.

When the SNR is small (SNR ≪ 0 dB), the capacity C\approx \frac{\bar{P}}{N_0 \ln 2} is linear in power but insensitive to bandwidth. This is called the power-limited regime.

The bandwidth-limited regime and power-limited regime are illustrated in the figure.

=Frequency-selective AWGN channel=

The capacity of the frequency-selective channel is given by so-called water filling power allocation,

:C_{N_c}=\sum_{n=0}^{N_c-1} \log_2 \left(1+\frac{P_n^* |\bar{h}_n|^2}{N_0} \right),

where P_n^*=\max \left\{ \left(\frac{1}{\lambda}-\frac{N_0}{|\bar{h}_n|^2} \right),0 \right\} and |\bar{h}_n|^2 is the gain of subchannel n, with \lambda chosen to meet the power constraint.

=Slow-fading channel=

In a slow-fading channel, where the coherence time is greater than the latency requirement, there is no definite capacity as the maximum rate of reliable communications supported by the channel, \log_2 (1+|h|^2 SNR), depends on the random channel gain |h|^2, which is unknown to the transmitter. If the transmitter encodes data at rate R [bits/s/Hz], there is a non-zero probability that the decoding error probability cannot be made arbitrarily small,

:p_{out}=\mathbb{P}(\log(1+|h|^2 SNR),

in which case the system is said to be in outage. With a non-zero probability that the channel is in deep fade, the capacity of the slow-fading channel in strict sense is zero. However, it is possible to determine the largest value of R such that the outage probability p_{out} is less than \epsilon. This value is known as the \epsilon-outage capacity.

=Fast-fading channel=

In a fast-fading channel, where the latency requirement is greater than the coherence time and the codeword length spans many coherence periods, one can average over many independent channel fades by coding over a large number of coherence time intervals. Thus, it is possible to achieve a reliable rate of communication of \mathbb{E}(\log_2 (1+|h|^2 SNR)) [bits/s/Hz] and it is meaningful to speak of this value as the capacity of the fast-fading channel.

Feedback Capacity

Feedback capacity is the greatest rate at which information can be reliably transmitted, per unit time, over a point-to-point communication channel in which the receiver feeds back the channel outputs to the transmitter. Information-theoretic analysis of communication systems that incorporate feedback is more complicated and challenging than without feedback. Possibly, this was the reason C.E. Shannon chose feedback as the subject of the first Shannon Lecture, delivered at the 1973 IEEE International Symposium on Information Theory in Ashkelon, Israel.

The feedback capacity is characterized by the maximum of the directed information between the channel inputs and the channel outputs, where the maximization is with respect to the causal conditioning of the input given the output. The directed information was coined by James Massey{{Cite journal |last=Massey |first=James |date=Nov 1990 |title=Causality, Feedback and Directed Information |url=http://www.isiweb.ee.ethz.ch/archive/massey_pub/pdf/BI532.pdf |journal=Proc. 1990 Int. Symp. On Information Theory and Its Applications (ISITA-90), Waikiki, HI. |pages=303–305}} in 1990, who showed that its an upper bound on feedback capacity. For memoryless channels, Shannon showed{{cite journal |last1=Shannon |first1=C. |title=The zero error capacity of a noisy channel |journal=IEEE Transactions on Information Theory |date=September 1956 |volume=2 |issue=3 |pages=8–19 |doi=10.1109/TIT.1956.1056798}} that feedback does not increase the capacity, and the feedback capacity coincides with the channel capacity characterized by the mutual information between the input and the output. The feedback capacity is known as a closed-form expression only for several examples such as the trapdoor channel,{{Cite journal |last1=Permuter |first1=Haim |last2=Cuff |first2=Paul |last3=Van Roy |first3=Benjamin |last4=Weissman |first4=Tsachy |date=July 2008 |title=Capacity of the trapdoor channel with feedback |url=https://www.ee.bgu.ac.il/~haimp/trapdoor_channel_it.pdf |journal=IEEE Trans. Inf. Theory |volume=54 |issue=7 |pages=3150–3165|doi=10.1109/TIT.2008.924681 |arxiv=cs/0610047 |s2cid=1265 }} Ising channel,{{cite journal |last1=Elishco |first1=Ohad |last2=Permuter |first2=Haim |title=Capacity and Coding for the Ising Channel With Feedback |journal=IEEE Transactions on Information Theory |date=September 2014 |volume=60 |issue=9 |pages=5138–5149 |doi=10.1109/TIT.2014.2331951|arxiv=1205.4674 |s2cid=9761759 }}{{cite journal |last1=Aharoni |first1=Ziv |last2=Sabag |first2=Oron |last3=Permuter |first3=Haim H. |title=Feedback Capacity of Ising Channels With Large Alphabet via Reinforcement Learning |journal=IEEE Transactions on Information Theory |date=September 2022 |volume=68 |issue=9 |pages=5637–5656 |doi=10.1109/TIT.2022.3168729|s2cid=248306743 }}. For some other channels, it is characterized through constant-size optimization problems such as the binary erasure channel with a no-consecutive-ones input constraint,{{cite journal |last1=Sabag |first1=Oron |last2=Permuter |first2=Haim H. |last3=Kashyap |first3=Navin |date=2016 |title=The Feedback Capacity of the Binary Erasure Channel With a No-Consecutive-Ones Input Constraint |journal=IEEE Transactions on Information Theory |volume=62 |issue=1 |pages=8–22 |doi=10.1109/TIT.2015.2495239}} NOST channel.{{cite journal |last1=Shemuel |first1=Eli |last2=Sabag |first2=Oron |last3=Permuter |first3=Haim H. |date=2022 |title=The Feedback Capacity of Noisy Output Is the State (NOST) Channels |journal=IEEE Transactions on Information Theory |volume=68 |issue=8 |pages=5044–5059 |doi=10.1109/TIT.2022.3165538|arxiv=2107.07164 }}

The basic mathematical model for a communication system is the following:

File:Communication with feedback.png

Here is the formal definition of each element (where the only difference with respect to the nonfeedback capacity is the encoder definition):

  • W is the message to be transmitted, taken in an alphabet \mathcal{W};
  • X is the channel input symbol (X^n is a sequence of n symbols) taken in an alphabet \mathcal{X};
  • Y is the channel output symbol (Y^n is a sequence of n symbols) taken in an alphabet \mathcal{Y};
  • \hat{W} is the estimate of the transmitted message;
  • f_i: \mathcal{W} \times \mathcal{Y}^{i-1} \to \mathcal{X} is the encoding function at time i, for a block of length n;
  • p(y_i|x^i,y^{i-1}) = p_{Y_i|X^i,Y^{i-1}}(y_i|x^i,y^{i-1}) is the noisy channel at time i, which is modeled by a conditional probability distribution; and,
  • \hat{w}: \mathcal{Y}^n \to \mathcal{W} is the decoding function for a block of length n.

That is, for each time i there exists a feedback of the previous output Y_{i-1} such that the encoder has access to all previous outputs Y^{i-1} . An (2^{nR},n) code is a pair of encoding and decoding mappings with \mathcal{W}=[1,2,\dots, 2^{nR}], and W is uniformly distributed. A rate R is said to be achievable if there exists a sequence of codes (2^{nR},n) such that the average probability of error: P_e^{(n)}\triangleq \Pr (\hat{W}\neq W) tends to zero as n\to \infty.

The feedback capacity is denoted by C_{\text{feedback}}, and is defined as the supremum over all achievable rates.

= Main results on feedback capacity =

Let X and Y be modeled as random variables. The causal conditioning P(y^n||x^n) \triangleq \prod_{i=1}^n P(y_i|y^{i-1},x^{i}) describes the given channel. The choice of the causally conditional distribution P(x^n||y^{n-1}) \triangleq \prod_{i=1}^n P(x_i|x^{i-1},y^{i-1}) determines the joint distribution p_{X^n,Y^n}(x^n,y^n) due to the chain rule for causal conditioning{{cite journal |last1=Permuter |first1=Haim Henry |last2=Weissman |first2=Tsachy |last3=Goldsmith |first3=Andrea J. |date=February 2009 |title=Finite State Channels With Time-Invariant Deterministic Feedback |journal=IEEE Transactions on Information Theory |volume=55 |issue=2 |pages=644–662 |arxiv=cs/0608070 |doi=10.1109/TIT.2008.2009849 |s2cid=13178}} P(y^n, x^n) = P(y^n||x^n) P(x^n||y^{n-1}) which, in turn, induces a directed information I(X^N \rightarrow Y^N)=\mathbf

E\left[ \log \frac{P(Y^N||X^N)}{P(Y^N)} \right].

The feedback capacity is given by

: \ C_{\text{feedback}} = \lim_{n \to \infty} \frac{1}{n} \sup_{P_{X^n||Y^{n-1}}} I(X^n \to Y^n)\, ,

where the supremum is taken over all possible choices of P_{X^n||Y^{n-1}}(x^n||y^{n-1}).

= Gaussian feedback capacity =

When the Gaussian noise is colored, the channel has memory. Consider for instance the simple case on an autoregressive model noise process z_i = z_{i-1}+w_i where w_i\sim N(0,1) is an i.i.d. process.

= Solution techniques =

The feedback capacity is difficult to solve in the general case. There are some techniques that are related to control theory and Markov decision processes if the channel is discrete.

See also

=Advanced Communication Topics=

References

{{reflist}}

{{Mobile phones}}

{{More citations needed|date=January 2008}}

Category:Information theory

Category:Telecommunication theory

Category:Television terminology