Paper 1, Section II, J

(a) Let $Z_{1}, \ldots, Z_{n}$ be i.i.d. random elements taking values in a set $\mathcal{Z}$ and let $\mathcal{F}$ be a class of functions $f: \mathcal{Z} \rightarrow \mathbb{R}$ . Define the Rademacher complexity $\mathcal{R}_{n}(\mathcal{F})$ . Write down an inequality relating the Rademacher complexity and

\mathbb{E}\left(\sup _{f \in \mathcal{F}} \frac{1}{n} \sum_{i=1}^{n}\left(f\left(Z_{i}\right)-\mathbb{E} f\left(Z_{i}\right)\right)\right)

State the bounded differences inequality.

(b) Now given i.i.d. input-output pairs $\left(X_{1}, Y_{1}\right), \ldots,\left(X_{n}, Y_{n}\right) \in \mathcal{X} \times\{-1,1\}$ consider performing empirical risk minimisation with misclassification loss over the class $\mathcal{H}$ of classifiers $h: \mathcal{X} \rightarrow\{-1,1\}$ . Denote by $\hat{h}$ the empirical risk minimiser [which you may assume exists]. For any classifier $h$ , let $R(h)$ be its misclassification risk and suppose this is minimised over $\mathcal{H}$ by $h^{*} \in \mathcal{H}$ . Prove that with probability at least $1-\delta$ ,

R(\hat{h})-R\left(h^{*}\right) \leqslant 2 \mathcal{R}_{n}(\mathcal{F})+\sqrt{\frac{2 \log (2 / \delta)}{n}}

for $\delta \in(0,1]$ , where $\mathcal{F}$ is a class of functions $f: \mathcal{X} \times\{-1,1\} \rightarrow\{0,1\}$ related to $\mathcal{H}$ that you should specify.

(c) Let $Z_{i}=\left(X_{i}, Y_{i}\right)$ for $i=1, \ldots, n$ . Define the empirical Rademacher complexity $\hat{\mathcal{R}}\left(\mathcal{F}\left(Z_{1: n}\right)\right)$ . Show that with probability at least $1-\delta$ ,

R(\hat{h})-R\left(h^{*}\right) \leqslant 2 \hat{\mathcal{R}}\left(\mathcal{F}\left(Z_{1: n}\right)\right)+2 \sqrt{\frac{2 \log (3 / \delta)}{n}}