Paper 1, Section II, K

Statistical Modelling | Part II, 2016

(a) Let YY be an nn-vector of responses from the linear model Y=Xβ+εY=X \beta+\varepsilon, with βRp\beta \in \mathbb{R}^{p}. The internally studentized residual is defined by

si=Yixiβ^σ~1pi,s_{i}=\frac{Y_{i}-x_{i}^{\top} \hat{\beta}}{\tilde{\sigma} \sqrt{1-p_{i}}},

where β^\hat{\beta} is the least squares estimate, pip_{i} is the leverage of sample ii, and

σ~2=YXβ^22(np).\tilde{\sigma}^{2}=\frac{\|Y-X \hat{\beta}\|_{2}^{2}}{(n-p)} .

Prove that the joint distribution of s=(s1,,sn)s=\left(s_{1}, \ldots, s_{n}\right)^{\top} is the same in the following two models: (i) εN(0,σI)\varepsilon \sim N(0, \sigma I), and (ii) εσN(0,σI)\varepsilon \mid \sigma \sim N(0, \sigma I), with 1/σχν21 / \sigma \sim \chi_{\nu}^{2} (in this model, ε1,,εn\varepsilon_{1}, \ldots, \varepsilon_{n} are identically tνt_{\nu}-distributed). [Hint: A random vector ZZ is spherically symmetric if for any orthogonal matrix H,HZ=dZH, H Z \stackrel{d}{=} Z. If ZZ is spherically symmetric and a.s. nonzero, then Z/Z2Z /\|Z\|_{2} is a uniform point on the sphere; in addition, any orthogonal projection of ZZ is also spherically symmetric. A standard normal vector is spherically symmetric.]

(b) A social scientist regresses the income of 120 Cambridge graduates onto 20 answers from a questionnaire given to the participants in their first year. She notices one questionnaire with very unusual answers, which she suspects was due to miscoding. The sample has a leverage of 0.80.8. To check whether this sample is an outlier, she computes its externally studentized residual,

ti=Yixiβ^σ~(i)1pi=4.57,t_{i}=\frac{Y_{i}-x_{i}^{\top} \hat{\beta}}{\tilde{\sigma}_{(i)} \sqrt{1-p_{i}}}=4.57,

where σ~(i)\tilde{\sigma}_{(i)} is estimated from a fit of all samples except the one in question, (xi,Yi)\left(x_{i}, Y_{i}\right). Is this a high leverage point? Can she conclude this sample is an outlier at a significance level of 5%5 \% ?

(c) After examining the following plot of residuals against the response, the investigator calculates the externally studentized residual of the participant denoted by the black dot, which is 2.332.33. Can she conclude this sample is an outlier with a significance level of 5%5 \% ?

Part II, 20162016 \quad List of Questions

Typos? Please submit corrections to this page on GitHub.