# Statistics

### Jump to year

Paper 1, Section I, H

commentLet $X_{1}, \ldots, X_{n}$ be i.i.d. Bernoulli $(p)$ random variables, where $n \geqslant 3$ and $p \in(0,1)$ is unknown.

(a) What does it mean for a statistic $T$ to be sufficient for $p$ ? Find such a sufficient statistic $T$.

(b) State and prove the Rao-Blackwell theorem.

(c) By considering the estimator $X_{1} X_{2}$ of $p^{2}$, find an unbiased estimator of $p^{2}$ that is a function of the statistic $T$ found in part (a), and has variance strictly smaller than that of $X_{1} X_{2}$.

Paper 1, Section II, H

comment(a) Show that if $W_{1}, \ldots, W_{n}$ are independent random variables with common $\operatorname{Exp}(1)$ distribution, then $\sum_{i=1}^{n} W_{i} \sim \Gamma(n, 1)$. [Hint: If $W \sim \Gamma(\alpha, \lambda)$ then $\mathbb{E} e^{t W}=\{\lambda /(\lambda-t)\}^{\alpha}$ if $t<\lambda$ and $\infty$ otherwise.]

(b) Show that if $X \sim U(0,1)$ then $-\log X \sim \operatorname{Exp}(1)$.

(c) State the Neyman-Pearson lemma.

(d) Let $X_{1}, \ldots, X_{n}$ be independent random variables with common density proportional to $x^{\theta} \mathbf{1}_{(0,1)}(x)$ for $\theta \geqslant 0$. Find a most powerful test of size $\alpha$ of $H_{0}: \theta=0$ against $H_{1}: \theta=1$, giving the critical region in terms of a quantile of an appropriate gamma distribution. Find a uniformly most powerful test of size $\alpha$ of $H_{0}: \theta=0$ against $H_{1}: \theta>0$.

Paper 2, Section I, $\mathbf{6 H}$

commentThe efficacy of a new drug was tested as follows. Fifty patients were given the drug, and another fifty patients were given a placebo. A week later, the numbers of patients whose symptoms had gone entirely, improved, stayed the same and got worse were recorded, as summarised in the following table.

\begin{tabular}{|c|c|c|} \hline & Drug & Placebo \ \hline symptoms gone & 14 & 6 \ improved & 21 & 19 \ same & 10 & 10 \ worse & 5 & 15 \ \hline \end{tabular}

Conduct a $5 \%$ significance level test of the null hypothesis that the medicine and placebo have the same effect, against the alternative that their effects differ.

[Hint: You may find some of the following values relevant:

\begin{tabular}{|c|cccccc|} \hline Distribution & $\chi_{1}^{2}$ & $\chi_{2}^{2}$ & $\chi_{3}^{2}$ & $\chi_{4}^{2}$ & $\chi_{6}^{2}$ & $\chi_{8}^{2}$ \ \hline 95 th percentile & $3.84$ & $5.99$ & $7.81$ & $9.48$ & $12.59$ & $15.51$ \ \hline \end{tabular}

Paper 3, Section II, $18 \mathrm{H}$

commentConsider the normal linear model $Y=X \beta+\varepsilon$ where $X$ is a known $n \times p$ design matrix with $n-2>p \geqslant 1, \beta \in \mathbb{R}^{p}$ is an unknown vector of parameters, and $\varepsilon \sim N_{n}\left(0, \sigma^{2} I\right)$ is a vector of normal errors with each component having variance $\sigma^{2}>0$. Suppose $X$ has full column rank.

(i) Write down the maximum likelihood estimators, $\hat{\beta}$ and $\hat{\sigma}^{2}$, for $\beta$ and $\sigma^{2}$ respectively. [You need not derive these.]

(ii) Show that $\hat{\beta}$ is independent of $\hat{\sigma}^{2}$.

(iii) Find the distributions of $\hat{\beta}$ and $n \hat{\sigma}^{2} / \sigma^{2}$.

(iv) Consider the following test statistic for testing the null hypothesis $H_{0}: \beta=0$ against the alternative $\beta \neq 0$ :

$T:=\frac{\|\hat{\beta}\|^{2}}{n \hat{\sigma}^{2}} .$

Let $\lambda_{1} \geqslant \lambda_{2} \geqslant \cdots \geqslant \lambda_{p}>0$ be the eigenvalues of $X^{T} X$. Show that under $H_{0}, T$ has the same distribution as

$\frac{\sum_{j=1}^{p} \lambda_{j}^{-1} W_{j}}{Z}$

where $Z \sim \chi_{n-p}^{2}$ and $W_{1}, \ldots, W_{p}$ are independent $\chi_{1}^{2}$ random variables, independent of $Z$.

[Hint: You may use the fact that $X=U D V^{T}$ where $U \in \mathbb{R}^{n \times p}$ has orthonormal columns, $V \in \mathbb{R}^{p \times p}$ is an orthogonal matrix and $D \in \mathbb{R}^{p \times p}$ is a diagonal matrix with $\left.D_{i i}=\sqrt{\lambda_{i}} .\right]$

(v) Find $\mathbb{E} T$ when $\beta \neq 0$. [Hint: If $R \sim \chi_{\nu}^{2}$ with $\nu>2$, then $\mathbb{E}(1 / R)=1 /(\nu-2)$.]

Paper 4, Section II, $\mathbf{1 7 H}$

commentSuppose we wish to estimate the probability $\theta \in(0,1)$ that a potentially biased coin lands heads up when tossed. After $n$ independent tosses, we observe $X$ heads.

(a) Write down the maximum likelihood estimator $\hat{\theta}$ of $\theta$.

(b) Find the mean squared error $f(\theta)$ of $\hat{\theta}$ as a function of $\theta$. Compute $\sup _{\theta \in(0,1)} f(\theta)$.

(c) Suppose a uniform prior is placed on $\theta$. Find the Bayes estimator of $\theta$ under squared error loss $L(\theta, a)=(\theta-a)^{2}$.

(d) Now find the Bayes estimator $\tilde{\theta}$ under the $\operatorname{loss} L(\theta, a)=\theta^{\alpha-1}(1-\theta)^{\beta-1}(\theta-a)^{2}$, where $\alpha, \beta \geqslant 1$. Show that

$\tilde{\theta}=w \hat{\theta}+(1-w) \theta_{0},$

where $w$ and $\theta_{0}$ depend on $n, \alpha$ and $\beta$.

(e) Determine the mean squared error $g_{w, \theta_{0}}(\theta)$ of $\tilde{\theta}$ as defined by $(*)$.

(f) For what range of values of $w$ do we have $\sup _{\theta \in(0,1)} g_{w, 1 / 2}(\theta) \leqslant \sup _{\theta \in(0,1)} f(\theta)$ ?

[Hint: The mean of a Beta $(a, b)$ distribution is $a /(a+b)$ and its density $p(u)$ at $u \in[0,1]$ is $c_{a, b} u^{a-1}(1-u)^{b-1}$, where $c_{a, b}$ is a normalising constant.]

Paper 1, Section I, $\mathbf{6 H}$

commentSuppose $X_{1}, \ldots, X_{n}$ are independent with distribution $N(\mu, 1)$. Suppose a prior $\mu \sim N\left(\theta, \tau^{-2}\right)$ is placed on the unknown parameter $\mu$ for some given deterministic $\theta \in \mathbb{R}$ and $\tau>0$. Derive the posterior mean.

Find an expression for the mean squared error of this posterior mean when $\theta=0$.

Paper 1, Section II, H

commentLet $X_{1}, \ldots, X_{n}$ be i.i.d. $U[0,2 \theta]$ random variables, where $\theta>0$ is unknown.

(a) Derive the maximum likelihood estimator $\hat{\theta}$ of $\theta$.

(b) What is a sufficient statistic? What is a minimal sufficient statistic? Is $\hat{\theta}$ sufficient for $\theta$ ? Is it minimal sufficient? Answer the same questions for the sample mean $\tilde{\theta}:=\sum_{i=1}^{n} X_{i} / n$. Briefly justify your answers.

[You may use any result from the course provided it is stated clearly.]

(c) Show that the mean squared errors of $\hat{\theta}$ and $\tilde{\theta}$ are respectively

$\frac{2 \theta^{2}}{(n+1)(n+2)} \quad \text { and } \quad \frac{\theta^{2}}{3 n} \text {. }$

(d) Show that for each $t \in \mathbb{R}, \lim _{n \rightarrow \infty} \mathbb{P}(n(1-\hat{\theta} / \theta) \geqslant t)=h(t)$ for a function $h$ you should specify. Give, with justification, an approximate $1-\alpha$ confidence interval for $\theta$ whose expected length is

$\left(\frac{n \theta}{n+1}\right)\left(\frac{\log (1 / \alpha)}{n-\log (1 / \alpha)}\right)$

[Hint: $\lim _{n \rightarrow \infty}\left(1-\frac{t}{n}\right)^{n}=e^{-t}$ for all $t \in \mathbb{R}$.]

Paper 2, Section II, H

commentConsider the general linear model $Y=X \beta^{0}+\varepsilon$ where $X$ is a known $n \times p$ design matrix with $p \geqslant 2, \beta^{0} \in \mathbb{R}^{p}$ is an unknown vector of parameters, and $\varepsilon \in \mathbb{R}^{n}$ is a vector of stochastic errors with $\mathbb{E}\left(\varepsilon_{i}\right)=0, \operatorname{var}\left(\varepsilon_{i}\right)=\sigma^{2}>0$ and $\operatorname{cov}\left(\varepsilon_{i}, \varepsilon_{j}\right)=0$ for all $i, j=1, \ldots, n$ with $i \neq j$. Suppose $X$ has full column rank.

(a) Write down the least squares estimate $\hat{\beta}$ of $\beta^{0}$ and show that it minimises the least squares objective $S(\beta)=\|Y-X \beta\|^{2}$ over $\beta \in \mathbb{R}^{p}$.

(b) Write down the variance-covariance matrix $\operatorname{cov}(\hat{\beta})$.

(c) Let $\tilde{\beta} \in \mathbb{R}^{p}$ minimise $S(\beta)$ over $\beta \in \mathbb{R}^{p}$ subject to $\beta_{p}=0$. Let $Z$ be the $n \times(p-1)$ submatrix of $X$ that excludes the final column. Write $\operatorname{down} \operatorname{cov}(\tilde{\beta})$.

(d) Let $P$ and $P_{0}$ be $n \times n$ orthogonal projections onto the column spaces of $X$ and $Z$ respectively. Show that for all $u \in \mathbb{R}^{n}, u^{T} P u \geqslant u^{T} P_{0} u$.

(e) Show that for all $x \in \mathbb{R}^{p}$,

$\operatorname{var}\left(x^{T} \tilde{\beta}\right) \leqslant \operatorname{var}\left(x^{T} \hat{\beta}\right) .$

[Hint: Argue that $x=X^{T} u$ for some $u \in \mathbb{R}^{n}$.]

Paper 1, Section I, H

commentSuppose that $X_{1}, \ldots, X_{n}$ are i.i.d. $N\left(\mu, \sigma^{2}\right)$ random variables.

(a) Compute the MLEs $\widehat{\mu}, \widehat{\sigma}^{2}$ for the unknown parameters $\mu, \sigma^{2}$.

(b) Give the definition of an unbiased estimator. Determine whether $\widehat{\mu}, \widehat{\sigma}^{2}$ are unbiased estimators for $\mu, \sigma^{2}$.

Paper 1, Section II, H

commentState and prove the Neyman-Pearson lemma.

Suppose that $X_{1}, \ldots, X_{n}$ are i.i.d. $\exp (\lambda)$ random variables where $\lambda$ is an unknown parameter. We wish to test the hypothesis $H_{0}: \lambda=\lambda_{0}$ against the hypothesis $H_{1}: \lambda=\lambda_{1}$ where $\lambda_{1}<\lambda_{0}$.

(a) Find the critical region of the likelihood ratio test of size $\alpha$ in terms of the sample mean $\bar{X}$.

(b) Define the power function of a hypothesis test and identify the power function in the setting described above in terms of the $\Gamma(n, \lambda)$ probability distribution function. [You may use without proof that $X_{1}+\cdots+X_{n}$ is distributed as a $\Gamma(n, \lambda)$ random variable.]

(c) Define what it means for a hypothesis test to be uniformly most powerful. Determine whether the likelihood ratio test considered above is uniformly most powerful for testing $H_{0}: \lambda=\lambda_{0}$ against $\widetilde{H}_{1}: \lambda<\lambda_{0}$.

Paper 2, Section I, H

commentSuppose that $X_{1}, \ldots, X_{n}$ are i.i.d. coin tosses with probability $\theta$ of obtaining a head.

(a) Compute the posterior distribution of $\theta$ given the observations $X_{1}, \ldots, X_{n}$ in the case of a uniform prior on $[0,1]$.

(b) Give the definition of the quadratic error loss function.

(c) Determine the value $\widehat{\theta}$of $\theta$ which minimizes the quadratic error loss function. Justify your answer. Calculate $\mathbb{E}[\hat{\theta}]$.

[You may use that the $\beta(a, b), a, b>0$, distribution has density function on $[0,1]$ given by

$c_{a, b} x^{a-1}(1-x)^{b-1}$

where $c_{a, b}$ is a normalizing constant. You may also use without proof that the mean of a $\beta(a, b)$ random variable is $a /(a+b) .]$

Paper 3, Section II, H

commentSuppose that $X_{1}, \ldots, X_{n}$ are i.i.d. $N\left(\mu, \sigma^{2}\right)$. Let

$\bar{X}=\frac{1}{n} \sum_{i=1}^{n} X_{i} \quad \text { and } \quad S_{X X}=\sum_{i=1}^{n}\left(X_{i}-\bar{X}\right)^{2}$

(a) Compute the distributions of $\bar{X}$ and $S_{X X}$ and show that $\bar{X}$ and $S_{X X}$ are independent.

(b) Write down the distribution of $\sqrt{n}(\bar{X}-\mu) / \sqrt{S_{X X} /(n-1)}$.

(c) For $\alpha \in(0,1)$, find a $100(1-\alpha) \%$ confidence interval in each of the following situations: (i) for $\mu$ when $\sigma^{2}$ is known; (ii) for $\mu$ when $\sigma^{2}$ is not known; (iii) for $\sigma^{2}$ when $\mu$ is not known.

(d) Suppose that $\widetilde{X}_{1}, \ldots, \widetilde{X}_{\widetilde{n}}$ are i.i.d. $N\left(\widetilde{\mu}, \widetilde{\sigma}^{2}\right)$. Explain how you would use the $F$ test to test the hypothesis $H_{1}: \sigma^{2}>\tilde{\sigma}^{2}$ against the hypothesis $H_{0}: \sigma^{2}=\tilde{\sigma}^{2}$. Does the $F$ test depend on whether $\mu, \widetilde{\mu}$ are known?

Paper 4, Section II, 19H

commentConsider the linear model

$Y_{i}=\beta x_{i}+\epsilon_{i} \quad \text { for } \quad i=1, \ldots, n$

where $x_{1}, \ldots, x_{n}$ are known and $\epsilon_{1}, \ldots, \epsilon_{n}$ are i.i.d. $N\left(0, \sigma^{2}\right)$. We assume that the parameters $\beta$ and $\sigma^{2}$ are unknown.

(a) Find the MLE $\widehat{\beta}$ of $\beta$. Explain why $\widehat{\beta}$ is the same as the least squares estimator of $\beta$.

(b) State and prove the Gauss-Markov theorem for this model.

(c) For each value of $\theta \in \mathbb{R}$ with $\theta \neq 0$, determine the unbiased linear estimator $\tilde{\beta}$ of $\beta$ which minimizes

$\mathbb{E}_{\beta, \sigma^{2}}[\exp (\theta(\tilde{\beta}-\beta))]$

Paper 1, Section I, H

comment$X_{1}, X_{2}, \ldots, X_{n}$ form a random sample from a distribution whose probability density function is

$f(x ; \theta)=\left\{\begin{array}{cc} \frac{2 x}{\theta^{2}} & 0 \leqslant x \leqslant \theta \\ 0 & \text { otherwise } \end{array}\right.$

where the value of the positive parameter $\theta$ is unknown. Determine the maximum likelihood estimator of the median of this distribution.

Paper 1, Section II, H

comment(a) Consider the general linear model $Y=X \theta+\varepsilon$ where $X$ is a known $n \times p$ matrix, $\theta$ is an unknown $p \times 1$ vector of parameters, and $\varepsilon$ is an $n \times 1$ vector of independent $N\left(0, \sigma^{2}\right)$ random variables with unknown variances $\sigma^{2}$. Show that, provided the matrix $X$ is of rank $p$, the least squares estimate of $\theta$ is

$\hat{\theta}=\left(X^{\mathrm{T}} X\right)^{-1} X^{\mathrm{T}} Y$

Let

$\hat{\varepsilon}=Y-X \hat{\theta}$

What is the distribution of $\hat{\varepsilon}^{\mathrm{T}} \hat{\varepsilon}$ ? Write down, in terms of $\hat{\varepsilon}^{\mathrm{T}} \hat{\varepsilon}$, an unbiased estimator of $\sigma^{2}$.

(b) Four points on the ground form the vertices of a plane quadrilateral with interior angles $\theta_{1}, \theta_{2}, \theta_{3}, \theta_{4}$, so that $\theta_{1}+\theta_{2}+\theta_{3}+\theta_{4}=2 \pi$. Aerial observations $Z_{1}, Z_{2}, Z_{3}, Z_{4}$ are made of these angles, where the observations are subject to independent errors distributed as $N\left(0, \sigma^{2}\right)$ random variables.

(i) Represent the preceding model as a general linear model with observations $\left(Z_{1}, Z_{2}, Z_{3}, Z_{4}-2 \pi\right)$ and unknown parameters $\left(\theta_{1}, \theta_{2}, \theta_{3}\right)$.

(ii) Find the least squares estimates $\hat{\theta}_{1}, \hat{\theta}_{2}, \hat{\theta}_{3}$.

(iii) Determine an unbiased estimator of $\sigma^{2}$. What is its distribution?

Paper 2, Section I, $8 \mathrm{H}$

commentDefine a simple hypothesis. Define the terms size and power for a test of one simple hypothesis against another. State the Neyman-Pearson lemma.

There is a single observation of a random variable $X$ which has a probability density function $f(x)$. Construct a best test of size $0.05$ for the null hypothesis

$H_{0}: \quad f(x)=\frac{1}{2}, \quad-1 \leqslant x \leqslant 1,$

against the alternative hypothesis

$H_{1}: \quad f(x)=\frac{3}{4}\left(1-x^{2}\right), \quad-1 \leqslant x \leqslant 1 .$

Calculate the power of your test.

Paper 3, Section II, H

commentA treatment is suggested for a particular illness. The results of treating a number of patients chosen at random from those in a hospital suffering from the illness are shown in the following table, in which the entries $a, b, c, d$ are numbers of patients.

$\begin{array}{lcc} & \text { Recovery } & \text { Non-recovery } \\ \text { Untreated } & a & b \\ \text { Treated } & c & d\end{array}$

Describe the use of Pearson's $\chi^{2}$ statistic in testing whether the treatment affects recovery, and outline a justification derived from the generalised likelihood ratio statistic. Show that

$\chi^{2}=\frac{(a d-b c)^{2}(a+b+c+d)}{(a+b)(c+d)(a+c)(b+d)}$

[Hint: You may find it helpful to observe that $a(a+b+c+d)-(a+b)(a+c)=a d-b c .]$

Comment on the use of this statistical technique when

$a=50, \quad b=10, \quad c=15, \quad d=5 .$

Paper 4, Section II, H

commentThere is widespread agreement amongst the managers of the Reliable Motor Company that the number $X$ of faulty cars produced in a month has a binomial distribution

$P(X=s)=\left(\begin{array}{c} n \\ s \end{array}\right) p^{s}(1-p)^{n-s} \quad(s=0,1, \ldots, n ; \quad 0 \leqslant p \leqslant 1)$

where $n$ is the total number of cars produced in a month. There is, however, some dispute about the parameter $p$. The general manager has a prior distribution for $p$ which is uniform, while the more pessimistic production manager has a prior distribution with density $2 p$, both on the interval $[0,1]$.

In a particular month, $s$ faulty cars are produced. Show that if the general manager's loss function is $(\hat{p}-p)^{2}$, where $\hat{p}$ is her estimate and $p$ the true value, then her best estimate of $p$ is

$\hat{p}=\frac{s+1}{n+2}$

The production manager has responsibilities different from those of the general manager, and a different loss function given by $(1-p)(\hat{p}-p)^{2}$. Find his best estimate of $p$ and show that it is greater than that of the general manager unless $s \geqslant \frac{1}{2} n$.

[You may use the fact that for non-negative integers $\alpha, \beta$,

$\left.\int_{0}^{1} p^{\alpha}(1-p)^{\beta} d p=\frac{\alpha ! \beta !}{(\alpha+\beta+1) !}\right]$

Paper 1, Section I, H

comment(a) State and prove the Rao-Blackwell theorem.

(b) Let $X_{1}, \ldots, X_{n}$ be an independent sample from $\operatorname{Poisson}(\lambda)$ with $\theta=e^{-\lambda}$ to be estimated. Show that $Y=1_{\{0\}}\left(X_{1}\right)$ is an unbiased estimator of $\theta$ and that $T=\sum_{i} X_{i}$ is a sufficient statistic.

What is $\mathbb{E}[Y \mid T] ?$

Paper 1, Section II, H

comment(a) Give the definitions of a sufficient and a minimal sufficient statistic $T$ for an unknown parameter $\theta$.

Let $X_{1}, X_{2}, \ldots, X_{n}$ be an independent sample from the geometric distribution with success probability $1 / \theta$ and mean $\theta>1$, i.e. with probability mass function

$p(m)=\frac{1}{\theta}\left(1-\frac{1}{\theta}\right)^{m-1} \text { for } m=1,2, \ldots$

Find a minimal sufficient statistic for $\theta$. Is your statistic a biased estimator of $\theta ?$

[You may use results from the course provided you state them clearly.]

(b) Define the bias of an estimator. What does it mean for an estimator to be unbiased?

Suppose that $Y$ has the truncated Poisson distribution with probability mass function

$p(y)=\left(e^{\theta}-1\right)^{-1} \cdot \frac{\theta^{y}}{y !} \quad \text { for } y=1,2, \ldots$

Show that the only unbiased estimator $T$ of $1-e^{-\theta}$ based on $Y$ is obtained by taking $T=0$ if $Y$ is odd and $T=2$ if $Y$ is even.

Is this a useful estimator? Justify your answer.

Paper 2, Section I, 8H

comment(a) Define a $100 \gamma \%$ confidence interval for an unknown parameter $\theta$.

(b) Let $X_{1}, \ldots, X_{n}$ be i.i.d. random variables with distribution $N(\mu, 1)$ with $\mu$ unknown. Find a $95 \%$ confidence interval for $\mu$.

[You may use the fact that $\Phi(1.96) \simeq 0.975 .]$

(c) Let $U_{1}, U_{2}$ be independent $U[\theta-1, \theta+1]$ with $\theta$ to be estimated. Find a $50 \%$ confidence interval for $\theta$.

Suppose that we have two observations $u_{1}=10$ and $u_{2}=11.5$. What might be a better interval to report in this case?

Paper 3, Section II, $\mathbf{2 0 H}$

commentConsider the general linear model

$\boldsymbol{Y}=X \boldsymbol{\beta}+\varepsilon$

where $X$ is a known $n \times p$ matrix of full rank $p<n, \varepsilon \sim \mathcal{N}_{n}\left(0, \sigma^{2} I\right)$ with $\sigma^{2}$ known and $\boldsymbol{\beta} \in \mathbb{R}^{p}$ is an unknown vector.

(a) State without proof the Gauss-Markov theorem.

Find the maximum likelihood estimator $\widehat{\boldsymbol{\beta}}$ for $\boldsymbol{\beta}$. Is it unbiased?

Let $\boldsymbol{\beta}^{*}$ be any unbiased estimator for $\boldsymbol{\beta}$ which is linear in $\left(Y_{i}\right)$. Show that

$\operatorname{var}\left(\boldsymbol{t}^{T} \widehat{\boldsymbol{\beta}}\right) \leqslant \operatorname{var}\left(\boldsymbol{t}^{T} \boldsymbol{\beta}^{*}\right)$

for all $\boldsymbol{t} \in \mathbb{R}^{p}$.

(b) Suppose now that $p=1$ and that $\boldsymbol{\beta}$ and $\sigma^{2}$ are both unknown. Find the maximum likelihood estimator for $\sigma^{2}$. What is the joint distribution of $\widehat{\boldsymbol{\beta}}$ and $\widehat{\sigma}^{2}$ in this case? Justify your answer.

Paper 4, Section II, H

comment(a) State and prove the Neyman-Pearson lemma.

(b) Let $X$ be a real random variable with density $f(x)=(2 \theta x+1-\theta) 1_{[0,1]}(x)$ with $-1 \leqslant \theta \leqslant 1 .$

Find a most powerful test of size $\alpha$ of $H_{0}: \theta=0$ versus $H_{1}: \theta=1$.

Find a uniformly most powerful test of size $\alpha$ of $H_{0}: \theta=0$ versus $H_{1}: \theta>0$.

Paper 1, Section I, H

commentLet $X_{1}, \ldots, X_{n}$ be independent samples from the exponential distribution with density $f(x ; \lambda)=\lambda e^{-\lambda x}$ for $x>0$, where $\lambda$ is an unknown parameter. Find the critical region of the most powerful test of size $\alpha$ for the hypotheses $H_{0}: \lambda=1$ versus $H_{1}: \lambda=2$. Determine whether or not this test is uniformly most powerful for testing $H_{0}^{\prime}: \lambda \leqslant 1$ versus $H_{1}^{\prime}: \lambda>1$.

Paper 1, Section II, H

comment(a) What does it mean to say a statistic $T$ is sufficient for an unknown parameter $\theta$ ? State the factorisation criterion for sufficiency and prove it in the discrete case.

(b) State and prove the Rao-Blackwell theorem.

(c) Let $X_{1}, \ldots, X_{n}$ be independent samples from the uniform distribution on $[-\theta, \theta]$ for an unknown positive parameter $\theta$. Consider the two-dimensional statistic

$T=\left(\min _{i} X_{i}, \max _{i} X_{i}\right) .$

Prove that $T$ is sufficient for $\theta$. Determine, with proof, whether or not $T$ is minimally sufficient.

Paper 2, Section I, H

commentThe efficacy of a new medicine was tested as follows. Fifty patients were given the medicine, and another fifty patients were given a placebo. A week later, the number of patients who got better, stayed the same, or got worse was recorded, as summarised in this table:

\begin{tabular}{|l|c|c|} \hline & medicine & placebo \ better & 28 & 22 \ same & 4 & 16 \ worse & 18 & 12 \ \hline \end{tabular}

Conduct a Pearson chi-squared test of size $1 \%$ of the hypothesis that the medicine and the placebo have the same effect.

[Hint: You may find the following values relevant:

$\left.\begin{array}{lcccccc}\text { Distribution } & \chi_{1}^{2} & \chi_{2}^{2} & \chi_{3}^{2} & \chi_{4}^{2} & \chi_{5}^{2} & \chi_{6}^{2} \\ 99 \% \text { percentile } & 6.63 & 9.21 & 11.34 & 13.3 & 15.09 & 16.81 .\end{array}\right]$

Paper 3, Section II, H

commentLet $X_{1}, \ldots, X_{n}$ be independent samples from the Poisson distribution with mean $\theta$.

(a) Compute the maximum likelihood estimator of $\theta$. Is this estimator biased?

(b) Under the assumption that $n$ is very large, use the central limit theorem to find an approximate $95 \%$ confidence interval for $\theta$. [You may use the notation $z_{\alpha}$ for the number such that $\mathbb{P}\left(Z \geqslant z_{\alpha}\right)=\alpha$ for a standard normal $\left.Z \sim N(0,1) .\right]$

(c) Now suppose the parameter $\theta$ has the $\Gamma(k, \lambda)$ prior distribution. What is the posterior distribution? What is the Bayes point estimator for $\theta$ for the quadratic loss function $L(\theta, a)=(\theta-a)^{2} ?$ Let $X_{n+1}$ be another independent sample from the same distribution. Given $X_{1}, \ldots, X_{n}$, what is the posterior probability that $X_{n+1}=0$ ?

[Hint: The density of the $\Gamma(k, \lambda)$ distribution is $f(x ; k, \lambda)=\lambda^{k} x^{k-1} e^{-\lambda x} / \Gamma(k)$, for $x>0$.]

Paper 4, Section II, H

commentConsider the linear regression model

$Y_{i}=\alpha+\beta x_{i}+\varepsilon_{i}$

for $i=1, \ldots, n$, where the non-zero numbers $x_{1}, \ldots, x_{n}$ are known and are such that $x_{1}+\ldots+x_{n}=0$, the independent random variables $\varepsilon_{1}, \ldots, \varepsilon_{n}$ have the $N\left(0, \sigma^{2}\right)$ distribution, and the parameters $\alpha, \beta$ and $\sigma^{2}$ are unknown.

(a) Let $(\hat{\alpha}, \hat{\beta})$ be the maximum likelihood estimator of $(\alpha, \beta)$. Prove that for each $i$, the random variables $\hat{\alpha}, \hat{\beta}$ and $Y_{i}-\hat{\alpha}-\hat{\beta} x_{i}$ are uncorrelated. Using standard facts about the multivariate normal distribution, prove that $\hat{\alpha}, \hat{\beta}$ and $\sum_{i=1}^{n}\left(Y_{i}-\hat{\alpha}-\hat{\beta} x_{i}\right)^{2}$ are independent.

(b) Find the critical region of the generalised likelihood ratio test of size $5 \%$ for testing $H_{0}: \alpha=0$ versus $H_{1}: \alpha \neq 0$. Prove that the power function of this test is of the form $w\left(\alpha, \beta, \sigma^{2}\right)=g(\alpha / \sigma)$ for some function $g$. [You are not required to find $g$ explicitly.]

Paper 1, Section I, H

commentSuppose that $X_{1}, \ldots, X_{n}$ are independent normally distributed random variables, each with mean $\mu$ and variance 1 , and consider testing $H_{0}: \mu=0$ against $H_{1}: \mu=1$. Explain what is meant by the critical region, the size and the power of a test.

For $0<\alpha<1$, derive the test that is most powerful among all tests of size at most $\alpha$. Obtain an expression for the power of your test in terms of the standard normal distribution function $\Phi(\cdot)$.

[Results from the course may be used without proof provided they are clearly stated.]

Paper 1, Section II, H

commentSuppose $X_{1}, \ldots, X_{n}$ are independent identically distributed random variables each with probability mass function $\mathbb{P}\left(X_{i}=x_{i}\right)=p\left(x_{i} ; \theta\right)$, where $\theta$ is an unknown parameter. State what is meant by a sufficient statistic for $\theta$. State the factorisation criterion for a sufficient statistic. State and prove the Rao-Blackwell theorem.

Suppose that $X_{1}, \ldots, X_{n}$ are independent identically distributed random variables with

$\mathbb{P}\left(X_{i}=x_{i}\right)=\left(\begin{array}{c} m \\ x_{i} \end{array}\right) \theta^{x_{i}}(1-\theta)^{m-x_{i}}, \quad x_{i}=0, \ldots, m$

where $m$ is a known positive integer and $\theta$ is unknown. Show that $\tilde{\theta}=X_{1} / m$ is unbiased for $\theta$.

Show that $T=\sum_{i=1}^{n} X_{i}$ is sufficient for $\theta$ and use the Rao-Blackwell theorem to find another unbiased estimator $\hat{\theta}$ for $\theta$, giving details of your derivation. Calculate the variance of $\hat{\theta}$ and compare it to the variance of $\tilde{\theta}$.

A statistician cannot remember the exact statement of the Rao-Blackwell theorem and calculates $\mathbb{E}\left(T \mid X_{1}\right)$ in an attempt to find an estimator of $\theta$. Comment on the suitability or otherwise of this approach, giving your reasons.

[Hint: If $a$ and $b$ are positive integers then, for $r=0,1, \ldots, a+b,\left(\begin{array}{c}a+b \\ r\end{array}\right)=$ $\left.\sum_{j=0}^{r}\left(\begin{array}{c}a \\ j\end{array}\right)\left(\begin{array}{c}b \\ r-j\end{array}\right) .\right]$

Paper 2, Section I, H

commentSuppose that, given $\theta$, the random variable $X$ has $\mathbb{P}(X=k)=e^{-\theta} \theta^{k} / k !$, $k=0,1,2, \ldots .$ Suppose that the prior density of $\theta$ is $\pi(\theta)=\lambda e^{-\lambda \theta}, \theta>0$, for some known $\lambda(>0)$. Derive the posterior density $\pi(\theta \mid x)$ of $\theta$ based on the observation $X=x$.

For a given loss function $L(\theta, a)$, a statistician wants to calculate the value of $a$ that minimises the expected posterior loss

$\int L(\theta, a) \pi(\theta \mid x) d \theta$

Suppose that $x=0$. Find $a$ in terms of $\lambda$ in the following cases:

(a) $L(\theta, a)=(\theta-a)^{2}$;

(b) $L(\theta, a)=|\theta-a|$.

Paper 3, Section II, H

comment(a) Suppose that $X_{1}, \ldots, X_{n}$ are independent identically distributed random variables, each with density $f(x)=\theta \exp (-\theta x), x>0$ for some unknown $\theta>0$. Use the generalised likelihood ratio to obtain a size $\alpha$ test of $H_{0}: \theta=1$ against $H_{1}: \theta \neq 1$.

(b) A die is loaded so that, if $p_{i}$ is the probability of face $i$, then $p_{1}=p_{2}=\theta_{1}$, $p_{3}=p_{4}=\theta_{2}$ and $p_{5}=p_{6}=\theta_{3}$. The die is thrown $n$ times and face $i$ is observed $x_{i}$ times. Write down the likelihood function for $\theta=\left(\theta_{1}, \theta_{2}, \theta_{3}\right)$ and find the maximum likelihood estimate of $\theta$.

Consider testing whether or not $\theta_{1}=\theta_{2}=\theta_{3}$ for this die. Find the generalised likelihood ratio statistic $\Lambda$ and show that

$2 \log _{e} \Lambda \approx T, \quad \text { where } T=\sum_{i=1}^{3} \frac{\left(o_{i}-e_{i}\right)^{2}}{e_{i}}$

where you should specify $o_{i}$ and $e_{i}$ in terms of $x_{1}, \ldots, x_{6}$. Explain how to obtain an approximate size $0.05$ test using the value of $T$. Explain what you would conclude (and why ) if $T=2.03$.

Paper 4, Section II, H

commentConsider a linear model $\mathbf{Y}=X \boldsymbol{\beta}+\varepsilon$ where $\mathbf{Y}$ is an $n \times 1$ vector of observations, $X$ is a known $n \times p$ matrix, $\boldsymbol{\beta}$ is a $p \times 1(p<n)$ vector of unknown parameters and $\varepsilon$ is an $n \times 1$ vector of independent normally distributed random variables each with mean zero and unknown variance $\sigma^{2}$. Write down the log-likelihood and show that the maximum likelihood estimators $\hat{\boldsymbol{\beta}}$ and $\hat{\sigma}^{2}$ of $\boldsymbol{\beta}$ and $\sigma^{2}$ respectively satisfy

$X^{T} X \hat{\boldsymbol{\beta}}=X^{T} \mathbf{Y}, \quad \frac{1}{\hat{\sigma}^{4}}(\mathbf{Y}-X \hat{\boldsymbol{\beta}})^{T}(\mathbf{Y}-X \hat{\boldsymbol{\beta}})=\frac{n}{\hat{\sigma}^{2}}$

$(T$ denotes the transpose $)$. Assuming that $X^{T} X$ is invertible, find the solutions $\hat{\boldsymbol{\beta}}$ and $\hat{\sigma}^{2}$ of these equations and write down their distributions.

Prove that $\hat{\boldsymbol{\beta}}$ and $\hat{\sigma}^{2}$ are independent.

Consider the model $Y_{i j}=\mu_{i}+\gamma x_{i j}+\varepsilon_{i j}, i=1,2,3$ and $j=1,2,3$. Suppose that, for all $i, x_{i 1}=-1, x_{i 2}=0$ and $x_{i 3}=1$, and that $\varepsilon_{i j}, i, j=1,2,3$, are independent $N\left(0, \sigma^{2}\right)$ random variables where $\sigma^{2}$ is unknown. Show how this model may be written as a linear model and write down $\mathbf{Y}, X, \boldsymbol{\beta}$ and $\varepsilon$. Find the maximum likelihood estimators of $\mu_{i}$ $(i=1,2,3), \gamma$ and $\sigma^{2}$ in terms of the $Y_{i j}$. Derive a $100(1-\alpha) \%$ confidence interval for $\sigma^{2}$ and for $\mu_{2}-\mu_{1}$.

[You may assume that, if $\mathbf{W}=\left(\mathbf{W}_{1}^{T}, \mathbf{W}_{2}^{T}\right)^{T}$ is multivariate normal with $\operatorname{cov}\left(\mathbf{W}_{1}, \mathbf{W}_{2}\right)=0$, then $\mathbf{W}_{1}$ and $\mathbf{W}_{2}$ are independent.]

Paper 1, Section I, $\mathbf{7 H} \quad$

commentConsider an estimator $\hat{\theta}$ of an unknown parameter $\theta$, and assume that $\mathbb{E}_{\theta}\left(\hat{\theta}^{2}\right)<\infty$ for all $\theta$. Define the bias and mean squared error of $\hat{\theta}$.

Show that the mean squared error of $\hat{\theta}$ is the sum of its variance and the square of its bias.

Suppose that $X_{1}, \ldots, X_{n}$ are independent identically distributed random variables with mean $\theta$ and variance $\theta^{2}$, and consider estimators of $\theta$ of the form $k \bar{X}$ where $\bar{X}=\frac{1}{n} \sum_{i=1}^{n} X_{i}$.

(i) Find the value of $k$ that gives an unbiased estimator, and show that the mean squared error of this unbiased estimator is $\theta^{2} / n$.

(ii) Find the range of values of $k$ for which the mean squared error of $k \bar{X}$ is smaller $\operatorname{than} \theta^{2} / n$.

Paper 1, Section II, H

commentSuppose that $X_{1}, X_{2}$, and $X_{3}$ are independent identically distributed Poisson random variables with expectation $\theta$, so that

$\mathbb{P}\left(X_{i}=x\right)=\frac{e^{-\theta} \theta^{x}}{x !} \quad x=0,1, \ldots$

and consider testing $H_{0}: \theta=1$ against $H_{1}: \theta=\theta_{1}$, where $\theta_{1}$ is a known value greater than 1. Show that the test with critical region $\left\{\left(x_{1}, x_{2}, x_{3}\right): \sum_{i=1}^{3} x_{i}>5\right\}$ is a likelihood ratio test of $H_{0}$ against $H_{1}$. What is the size of this test? Write down an expression for its power.

A scientist counts the number of bird territories in $n$ randomly selected sections of a large park. Let $Y_{i}$ be the number of bird territories in the $i$ th section, and suppose that $Y_{1}, \ldots, Y_{n}$ are independent Poisson random variables with expectations $\theta_{1}, \ldots, \theta_{n}$ respectively. Let $a_{i}$ be the area of the $i$ th section. Suppose that $n=2 m$, $a_{1}=\cdots=a_{m}=a(>0)$ and $a_{m+1}=\cdots=a_{2 m}=2 a$. Derive the generalised likelihood ratio $\Lambda$ for testing

$H_{0}: \theta_{i}=\lambda a_{i} \text { against } H_{1}: \theta_{i}= \begin{cases}\lambda_{1} & i=1, \ldots, m \\ \lambda_{2} & i=m+1, \ldots, 2 m\end{cases}$

What should the scientist conclude about the number of bird territories if $2 \log _{e}(\Lambda)$ is $15.67 ?$

[Hint: Let $F_{\theta}(x)$ be $\mathbb{P}(W \leqslant x)$ where $W$ has a Poisson distribution with expectation $\theta$. Then

$\left.F_{1}(3)=0.998, \quad F_{3}(5)=0.916, \quad F_{3}(6)=0.966, \quad F_{5}(3)=0.433 .\right]$

Paper 2, Section I, H

commentThere are 100 patients taking part in a trial of a new surgical procedure for a particular medical condition. Of these, 50 patients are randomly selected to receive the new procedure and the remaining 50 receive the old procedure. Six months later, a doctor assesses whether or not each patient has fully recovered. The results are shown below:

\begin{tabular}{l|c|c} & Fully recovered & Not fully recovered \ \hline Old procedure & 25 & 25 \ \hline New procedure & 31 & 19 \end{tabular}

The doctor is interested in whether there is a difference in full recovery rates for patients receiving the two procedures. Carry out an appropriate $5 \%$ significance level test, stating your hypotheses carefully. [You do not need to derive the test.] What conclusion should be reported to the doctor?

[Hint: Let $\chi_{k}^{2}(\alpha)$ denote the upper $100 \alpha$ percentage point of a $\chi_{k}^{2}$ distribution. Then

$\left.\chi_{1}^{2}(0.05)=3.84, \chi_{2}^{2}(0.05)=5.99, \chi_{3}^{2}(0.05)=7.82, \chi_{4}^{2}(0.05)=9.49 .\right]$

Paper 3, Section II, H

commentSuppose that $X_{1}, \ldots, X_{n}$ are independent identically distributed random variables with

$\mathbb{P}\left(X_{i}=x\right)=\left(\begin{array}{c} k \\ x \end{array}\right) \theta^{x}(1-\theta)^{k-x}, \quad x=0, \ldots, k$

where $k$ is known and $\theta(0<\theta<1)$ is an unknown parameter. Find the maximum likelihood estimator $\hat{\theta}$ of $\theta$.

Statistician 1 has prior density for $\theta$ given by $\pi_{1}(\theta)=\alpha \theta^{\alpha-1}, 0<\theta<1$, where $\alpha>1$. Find the posterior distribution for $\theta$ after observing data $X_{1}=x_{1}, \ldots, X_{n}=x_{n}$. Write down the posterior mean $\hat{\theta}_{1}^{(B)}$, and show that

$\hat{\theta}_{1}^{(B)}=c \hat{\theta}+(1-c) \tilde{\theta}_{1}$

where $\tilde{\theta}_{1}$ depends only on the prior distribution and $c$ is a constant in $(0,1)$ that is to be specified.

Statistician 2 has prior density for $\theta$ given by $\pi_{2}(\theta)=\alpha(1-\theta)^{\alpha-1}, 0<\theta<1$. Briefly describe the prior beliefs that the two statisticians hold about $\theta$. Find the posterior mean $\hat{\theta}_{2}^{(B)}$ and show that $\hat{\theta}_{2}^{(B)}<\hat{\theta}_{1}^{(B)}$.

Suppose that $\alpha$ increases (but $n, k$ and the $x_{i}$ remain unchanged). How do the prior beliefs of the two statisticians change? How does $c$ vary? Explain briefly what happens to $\hat{\theta}_{1}^{(B)}$ and $\hat{\theta}_{2}^{(B)}$.

[Hint: The Beta $(\alpha, \beta)(\alpha>0, \beta>0)$ distribution has density

$f(x)=\frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha) \Gamma(\beta)} x^{\alpha-1}(1-x)^{\beta-1}, \quad 0<x<1$

with expectation $\frac{\alpha}{\alpha+\beta}$ and variance $\frac{\alpha \beta}{(\alpha+\beta+1)(\alpha+\beta)^{2}}$. Here, $\Gamma(\alpha)=\int_{0}^{\infty} x^{\alpha-1} e^{-x} d x, \alpha>0$, is the Gamma function.]

Paper 4, Section II, H

commentConsider a linear model

$\mathbf{Y}=X \boldsymbol{\beta}+\varepsilon$

where $X$ is a known $n \times p$ matrix, $\boldsymbol{\beta}$ is a $p \times 1(p<n)$ vector of unknown parameters and $\varepsilon$ is an $n \times 1$ vector of independent $N\left(0, \sigma^{2}\right)$ random variables with $\sigma^{2}$ unknown. Assume that $X$ has full rank $p$. Find the least squares estimator $\hat{\boldsymbol{\beta}}$ of $\boldsymbol{\beta}$ and derive its distribution. Define the residual sum of squares $R S S$ and write down an unbiased estimator $\hat{\sigma}^{2}$ of $\sigma^{2}$.

Suppose that $V_{i}=a+b u_{i}+\delta_{i}$ and $Z_{i}=c+d w_{i}+\eta_{i}$, for $i=1, \ldots, m$, where $u_{i}$ and $w_{i}$ are known with $\sum_{i=1}^{m} u_{i}=\sum_{i=1}^{m} w_{i}=0$, and $\delta_{1}, \ldots, \delta_{m}, \eta_{1}, \ldots, \eta_{m}$ are independent $N\left(0, \sigma^{2}\right)$ random variables. Assume that at least two of the $u_{i}$ are distinct and at least two of the $w_{i}$ are distinct. Show that $\mathbf{Y}=\left(V_{1}, \ldots, V_{m}, Z_{1}, \ldots, Z_{m}\right)^{T}$ (where $T$ denotes transpose) may be written as in ( $\dagger$ ) and identify $X$ and $\boldsymbol{\beta}$. Find $\hat{\boldsymbol{\beta}}$ in terms of the $V_{i}, Z_{i}$, $u_{i}$ and $w_{i}$. Find the distribution of $\hat{b}-\hat{d}$ and derive a $95 \%$ confidence interval for $b-d$.

[Hint: You may assume that $\frac{R S S}{\sigma^{2}}$ has a $\chi_{n-p}^{2}$ distribution, and that $\hat{\beta}$ and the residual sum of squares are independent. Properties of $\chi^{2}$ distributions may be used without proof.]

Paper 1, Section I, H

Let $x_{1}, \ldots, x_{n}$