B1.14

Information Theory | Part II, 2004

State the formula for the capacity of a memoryless channel.

(a) Consider a memoryless channel where there are two input symbols, $A$ and $B$ , and three output symbols, $A, B$ , *. Suppose each input symbol is left intact with probability $1 / 2$ , and transformed into a $*$ with probability $1 / 2$ . Write down the channel matrix, and calculate the capacity.

(b) Now suppose the output is further processed by someone who cannot distinguish $A$ and $*$ , so that the matrix becomes

\left(\begin{array}{cc} 1 & 0 \\ 1 / 2 & 1 / 2 \end{array}\right)

Calculate the new capacity.

comment

B2.14

Information Theory | Part II, 2004

For integer-valued random variables $X$ and $Y$ , define the relative entropy $h_{Y}(X)$ of $X$ relative to $Y$ .

Prove that $h_{Y}(X) \geqslant 0$ , with equality if and only if $\mathbb{P}(X=x)=\mathbb{P}(Y=x)$ for all $x$ .

By considering $Y$ , a geometric random variable with parameter chosen appropriately, show that if the mean $\mathbb{E} X=\mu<\infty$ , then

h(X) \leqslant(\mu+1) \log (\mu+1)-\mu \log \mu,

with equality if $X$ is geometric.

comment

B4.13

Information Theory | Part II, 2004

Define a cyclic code of length $N$ .

Show how codewords can be identified with polynomials in such a way that cyclic codes correspond to ideals in the polynomial ring with a suitably chosen multiplication rule.

Prove that any cyclic code $\mathcal{X}$ has a unique generator, i.e. a polynomial $c(X)$ of minimum degree, such that the code consists of the multiples of this polynomial. Prove that the rank of the code equals $N-\operatorname{deg} c(X)$ , and show that $c(X)$ divides $X^{N}+1$ .

Let $\mathcal{X}$ be a cyclic code. Set

\mathcal{X}^{\perp}=\left\{y=y_{1} \ldots y_{N}: \sum_{i=1}^{N} x_{i} y_{i}=0 \text { for all } x=x_{1} \ldots x_{N} \in \mathcal{X}\right\}

(the dual code). Prove that $\mathcal{X}^{\perp}$ is cyclic and establish how the generators of $\mathcal{X}$ and $\mathcal{X}^{\perp}$ are related to each other.

Show that the repetition and parity codes are cyclic, and determine their generators.

comment

B1.14

Information Theory | Part II, 2003

A binary Huffman code is used for encoding symbols $1, \ldots, m$ occurring with probabilities $p_{1} \geqslant \ldots \geqslant p_{m}>0$ where $\sum_{1 \leqslant j \leqslant m} p_{j}=1$ . Let $s_{1}$ be the length of a shortest codeword and $s_{m}$ of a longest codeword. Determine the maximal and minimal values of $s_{1}$ and $s_{m}$ , and find binary trees for which they are attained.

comment

B2.14

Information Theory | Part II, 2003

Let $\mathcal{X}$ be a binary linear code of length $n$ , rank $k$ and distance $d$ . Let $x=$ $\left(x_{1}, \ldots, x_{n}\right) \in \mathcal{X}$ be a codeword with exactly $d$ non-zero digits.

(a) Prove that $n \geqslant d+k-1$ (the Singleton bound).

(b) Prove that truncating $\mathcal{X}$ on the non-zero digits of $x$ produces a code $\mathcal{X}^{\prime}$ of length $n-d$ , rank $k-1$ and distance $d^{\prime}$ for some $d^{\prime} \geqslant\left\lceil\frac{d}{2}\right\rceil$ . Here $\lceil a\rceil$ is the integer satisfying $a \leqslant\lceil a\rceil<a+1, a \in \mathbb{R}$ .

[Hint: Assume the opposite. Then, given $y \in \mathcal{X}$ and its truncation $y^{\prime} \in \mathcal{X}^{\prime}$ , consider the coordinates where $x$ and $y$ have 1 in common (i.e. $x_{j}=y_{j}=1$ ) and where they differ $\left(\right.$ e.g. $x_{j}=1$ and $\left.y_{j}=0\right)$ .]

(c) Deduce that $n \geqslant d+\sum_{1 \leqslant \ell \leqslant k-1}\left\lceil\frac{d}{2^{\ell}}\right\rceil$ (an improved Singleton bound).

comment

B4.13

Information Theory | Part II, 2003

State and prove the Fano and generalized Fano inequalities.

comment

B1.14

Information Theory | Part II, 2002

(a) Define the entropy $h(X)$ and the mutual entropy $i(X, Y)$ of random variables $X$ and $Y$ . Prove the inequality

0 \leqslant i(X, Y) \leqslant \min \{h(X), h(Y)\}

[You may assume the Gibbs inequality.]

(b) Let $X$ be a random variable and let $\mathbf{Y}=\left(Y_{1}, \ldots, Y_{n}\right)$ be a random vector.

(i) Prove or disprove by producing a counterexample the inequality

i(X, \mathbf{Y}) \leqslant \sum_{j=1}^{n} i\left(X, Y_{j}\right)

first under the assumption that $Y_{1}, \ldots, Y_{n}$ are independent random variables, and then under the assumption that $Y_{1}, \ldots, Y_{n}$ are conditionally independent given $X$ .

(ii) Prove or disprove by producing a counterexample the inequality

i(X, \mathbf{Y}) \geqslant \sum_{j=1}^{n} i\left(X, Y_{j}\right)

first under the assumption that $Y_{1}, \ldots, Y_{n}$ are independent random variables, and then under the assumption that $Y_{1}, \ldots, Y_{n}$ are conditionally independent given $X$ .

comment

B2.14

Information Theory | Part II, 2002

Define the binary Hamming code of length $n=2^{\ell}-1$ and its dual. Prove that the Hamming code is perfect. Prove that in the dual code:

(i) The weight of any non-zero codeword equals $2^{\ell-1}$ ;

(ii) The distance between any pair of words equals $2^{\ell-1}$ .

[You may quote results from the course provided that they are carefully stated.]

comment

B4.13

Information Theory | Part II, 2002

Define the Huffman binary encoding procedure and prove its optimality among decipherable codes.

comment

B1.14

Information Theory | Part II, 2001

Let $p_{1}, \ldots, p_{n}$ be a probability distribution, with $p^{*}=\max _{i}\left[p_{i}\right]$ . Prove that

\begin{aligned} &(i)-\sum_{i} p_{i} \log p_{i} \geqslant-p^{*} \log p^{*}-\left(1-p^{*}\right) \log \left(1-p^{*}\right) \\ &(\text { ii })-\sum_{i} p_{i} \log p_{i} \geqslant \log \left(1 / p^{*}\right) ; \text { and } \\ &(\text { iii })-\sum_{i} p_{i} \log p_{i} \geqslant 2\left(1-p^{*}\right) \end{aligned}

All logarithms are to base 2 .

[Hint: To prove (iii), it is convenient to use (i) for $p^{*} \geqslant \frac{1}{2}$ and (ii) for $p^{*} \leqslant \frac{1}{2}$ .]

Random variables $X$ and $Y$ with values $x$ and $y$ from finite 'alphabets' $I$ and $J$ represent the input and output of a transmission channel, with the conditional probability $p(x \mid y)=\mathbb{P}(X=x \mid Y=y)$ . Let $h(p(\cdot \mid y))$ denote the entropy of the conditional distribution $p(\cdot \mid y), y \in J$ , and $h(X \mid Y)$ denote the conditional entropy of $X$ given $Y$ . Define the ideal observer decoding rule as a map $f: J \rightarrow I$ such that $p(f(y) \mid y)=\max _{x \in I} p(x \mid y)$ for all $y \in J$ . Show that under this rule the error probability

\pi_{\mathrm{er}}(y)=\sum_{\substack{x \in I \\ x \neq f(y)}} p(x \mid y)

satisfies $\pi_{\mathrm{er}}(y) \leqslant \frac{1}{2} h(p(\cdot \mid y))$ , and the expected value satisfies

\mathbb{E} \pi_{\mathrm{er}}(Y) \leqslant \frac{1}{2} h(X \mid Y)

comment

B2.14

Information Theory | Part II, 2001

A subset $\mathcal{C}$ of the Hamming space $\{0,1\}^{n}$ of cardinality $|\mathcal{C}|=r$ and with the minimal (Hamming) distance $\min \left[d\left(x, x^{\prime}\right): x, x^{\prime} \in \mathcal{C}, x \neq x^{\prime}\right]=\delta$ is called an $[n, r, \delta]$ -code (not necessarily linear). An $[n, r, \delta]$ -code is called maximal if it is not contained in any $[n, r+1, \delta]$ -code. Prove that an $[n, r, \delta]$ -code is maximal if and only if for any $y \in\{0,1\}^{n}$ there exists $x \in \mathcal{C}$ such that $d(x, y)<\delta$ . Conclude that if there are $\delta$ or more changes made in a codeword then the new word is closer to some other codeword than to the original one.

Suppose that a maximal $[n, r, \delta]$ -code is used for transmitting information via a binary memoryless channel with the error probability $p$ , and the receiver uses the maximum likelihood decoder. Prove that the probability of erroneous decoding, $\pi_{\mathrm{err}}^{\mathrm{ml}}$ , obeys the bounds

1-b(n, \delta-1) \leqslant \pi_{\mathrm{err}}^{\mathrm{ml}} \leqslant 1-b(n,[(\delta-1) / 2])

where

b(n, m)=\sum_{0 \leqslant k \leqslant m}\left(\begin{array}{l} n \\ k \end{array}\right) p^{k}(1-p)^{n-k}

is a partial binomial sum and $[\cdot]$ is the integer part.

comment

B4.13

Information Theory | Part II, 2001

State the Kraft inequality. Prove that it gives a necessary and sufficient condition for the existence of a prefix-free code with given codeword lengths.

comment

Mathematics Tripos Papers

Information Theory

Jump to year

2004

B1.14

B2.14

B4.13

2003

B1.14

B2.14

B4.13

2002

B1.14

B2.14

B4.13

2001

B1.14

B2.14

B4.13

Information Theory