Paper 1, Section II, J

Statistical Modelling | Part II, 2011

The data consist of the record times in 1984 for 35 Scottish hill races. The columns list the record time in minutes, the distance in miles, and the total height gained during the route. The data are displayed in RR as follows (abbreviated):

 > hills  Greenmantle 2.565016.083 Carnethy 6.0250048.350 Craig Dunain 6.090033.650 Ben Rha 7.580045.600 Ben Lomond 8.0307062.267 [...]  Cockleroi 4.585028.100 Moffat Chase 20.05000159.833\begin{array}{lrrr}\text { > hills } & & & \\ \text { Greenmantle } & 2.5 & 650 & 16.083 \\ \text { Carnethy } & 6.0 & 2500 & 48.350 \\ \text { Craig Dunain } & 6.0 & 900 & 33.650 \\ \text { Ben Rha } & 7.5 & 800 & 45.600 \\ \text { Ben Lomond } & 8.0 & 3070 & 62.267 \\ \text { [...] } & & & \\ \text { Cockleroi } & 4.5 & 850 & 28.100 \\ \text { Moffat Chase } & 20.0 & 5000 & 159.833\end{array}

Consider a simple linear regression of time on dist and climb. Write down this model mathematically, and explain any assumptions that you make. How would you instruct RR to fit this model and assign it to a variable hills. lm1\operatorname{lm} 1 ?

First, we test the hypothesis of no linear relationship to the variables dist and climb against the full model. R\mathrm{R} provides the following ANOVA summary:

Using the information in this table, explain carefully how you would test this hypothesis. What do you conclude?

The R\mathrm{R} command

summary (hills. Im1)

provides the following (slightly abbreviated) summary:

Carefully explain the information that appears in each column of the table. What are your conclusions? In particular, how would you test for the significance of the variable climb in this model?

Figure 1: Hills data: diagnostic plots

Finally, we perform model diagnostics on the full model, by looking at studentised residuals versus fitted values, and the normal QQ-plot. The plots are displayed in Figure 1.1 . Comment on possible sources of model misspecification. Is it possible that the problem lies with the data? If so, what do you suggest?

Typos? Please submit corrections to this page on GitHub.