Paper 4, Section II, J

Consider the normal linear model where the $n$ -vector of responses $Y$ satisfies $Y=X \beta+\varepsilon$ with $\varepsilon \sim N_{n}\left(0, \sigma^{2} I\right)$ . Here $X$ is an $n \times p$ matrix of predictors with full column rank where $p \geqslant 3$ and $\beta \in \mathbb{R}^{p}$ is an unknown vector of regression coefficients. For $j \in\{1, \ldots, p\}$ , denote the $j$ th column of $X$ by $X_{j}$ , and let $X_{-j}$ be $X$ with its $j$ th column removed. Suppose $X_{1}=1_{n}$ where $1_{n}$ is an $n$ -vector of 1 's. Denote the maximum likelihood estimate of $\beta$ by $\beta$ . Write down the formula for $\hat{\beta}_{j}$ involving $P_{-j}$ , the orthogonal projection onto the column space of $X_{-j}$ .

Consider $j, k \in\{2, \ldots, p\}$ with $j<k$ . By thinking about the orthogonal projection of $X_{j}$ onto $X_{k}$ , show that

\operatorname{var}\left(\hat{\beta}_{j}\right) \geqslant \frac{\sigma^{2}}{\left\|X_{j}\right\|^{2}}\left(1-\left(\frac{X_{k}^{T} X_{j}}{\left\|X_{k}\right\|\left\|X_{j}\right\|}\right)^{2}\right)^{-1}

[You may use standard facts about orthogonal projections including the fact that if $V$ and $W$ are subspaces of $\mathbb{R}^{n}$ with $V$ a subspace of $W$ and $\Pi_{V}$ and $\Pi_{W}$ denote orthogonal projections onto $V$ and $W$ respectively, then for all $v \in \mathbb{R}^{n},\left\|\Pi_{W} v\right\|^{2} \geqslant\left\|\Pi_{V} v\right\|^{2}$ .]

By considering the fitted values $X \hat{\beta}$ , explain why if, for any $j \geqslant 2$ , a constant is added to each entry in the $j$ th column of $X$ , then $\hat{\beta}_{j}$ will remain unchanged. Let $\bar{X}_{j}=\sum_{i=1}^{n} X_{i j} / n$ . Why is (*) also true when all instances of $X_{j}$ and $X_{k}$ are replaced by $X_{j}-\bar{X}_{j} 1_{n}$ and $X_{k}-\bar{X}_{k} 1_{n}$ respectively?

The marks from mid-year statistics and mathematics tests and an end-of-year statistics exam are recorded for 100 secondary school students. The first few lines of the data are given below.

The following abbreviated output is obtained:

What are the hypothesis tests corresponding to the final column of the coefficients table? What is the hypothesis test corresponding to the final line of the output? Interpret the results when testing at the $5 \%$ level.

How does the following sample correlation matrix for the data help to explain the relative sizes of some of the $p$ -values?