Paper 4, Section II, H

Explain the notion of a sufficient statistic.

Suppose $X$ is a random variable with distribution $F$ taking values in $\{1, \ldots, 6\}$ , with $P(X=i)=p_{i}$ . Let $x_{1}, \ldots, x_{n}$ be a sample from $F$ . Suppose $n_{i}$ is the number of these $x_{j}$ that are equal to $i$ . Use a factorization criterion to explain why $\left(n_{1}, \ldots, n_{6}\right)$ is sufficient for $\theta=\left(p_{1}, \ldots, p_{6}\right)$ .

Let $H_{0}$ be the hypothesis that $p_{i}=1 / 6$ for all $i$ . Derive the statistic of the generalized likelihood ratio test of $H_{0}$ against the alternative that this is not a good fit.

Assuming that $n_{i} \approx n / 6$ when $H_{0}$ is true and $n$ is large, show that this test can be approximated by a chi-squared test using a test statistic

T=-n+\frac{6}{n} \sum_{i=1}^{6} n_{i}^{2}

Suppose $n=100$ and $T=8.12$ . Would you reject $H_{0} ?$ Explain your answer.