Paper 4, Section I, K

(a) Let $Y_{i}=x_{i}^{\top} \beta+\varepsilon_{i}$ where $\varepsilon_{i}$ for $i=1, \ldots, n$ are independent and identically distributed. Let $Z_{i}=I\left(Y_{i}<0\right)$ for $i=1, \ldots, n$ , and suppose that these variables follow a binary regression model with the complementary log-log link function $g(\mu)=$ $\log (-\log (1-\mu))$ . What is the probability density function of $\varepsilon_{1}$ ?

(b) The Newton-Raphson algorithm can be applied to compute the MLE, $\hat{\beta}$ , in certain GLMs. Starting from $\beta^{(0)}=0$ , we let $\beta^{(t+1)}$ be the maximizer of the quadratic approximation of the log-likelihood $\ell(\beta ; Y)$ around $\beta^{(t)}$ :

\ell(\beta ; Y) \approx \ell\left(\beta^{(t)} ; Y\right)+\left(\beta-\beta^{(t)}\right)^{\top} D \ell\left(\beta^{(t)} ; Y\right)+\left(\beta-\beta^{(t)}\right)^{\top} D^{2} \ell\left(\beta^{(t)} ; Y\right)\left(\beta-\beta^{(t)}\right),

where $D \ell$ and $D^{2} \ell$ are the gradient and Hessian of the log-likelihood. What is the difference between this algorithm and Iterative Weighted Least Squares? Why might the latter be preferable?