Problem 9
(a) Consider a normal linear model \(y=X \beta+\varepsilon\) where \(\operatorname{var}(\varepsilon)=\sigma^{2} W^{-1}\), and \(W\) is a known positive definite symmetric matrix. Show that a inverse square root matrix \(W^{1 / 2}\) exists, and re-express the least squares problem in terms of \(y_{11}=W^{1 / 2} y, X_{1}=W^{1 / 2} X\), and \(\varepsilon_{1}=W^{1 / 2} \varepsilon .\) Show that \(\operatorname{var}\left(\varepsilon_{1}\right)=\sigma^{2} I_{n} .\) Hence find the least squares estimates, hat matrix, and residual sum of squares for the weighted regression in terms of \(y, X\), and \(W\), and give the distributions of the least squares estimates of \(\beta\) and the residual sum of squares. (b) Suppose that \(W\) depends on an unknown scalar parameter, \(\rho\). Find the profile log likelihood for \(\rho, \ell_{\mathrm{p}}(\rho)=\max _{\beta, \sigma^{2}} \ell\left(\beta, \sigma^{2}, \rho\right)\), and outline how to use a least squares package to give a confidence interval for \(\rho\).
Problem 11
Suppose that we wish to construct the likelihood ratio statistic for comparison of the two linear models \(y=X_{1} \beta_{1}+\varepsilon\) and \(y=X_{1} \beta_{1}+X_{2} \beta_{2}+\varepsilon\), where the components of \(\varepsilon\) are independent normal variables with mean zero and variance \(\sigma^{2} ;\) call the corresponding residual sums of squares \(S S_{1}\) and \(S S\) on \(v_{1}\) and \(v\) degrees of freedom. (a) Show that the maximum value of the log likelihood is \(-\frac{1}{2} n(\log S S+1-\log n)\) for a model whose residual sum of squares is \(S S\), and deduce that the likelihood ratio statistic for comparison of the models above is \(W=n \log \left(S S_{1} / S S\right)\). (b) By writing \(S S_{1}=S S+\left(S S_{1}-S S\right)\), show that \(W\) is a monotonic function of the \(F\) statistic for comparison of the models. (c) Show that \(W \doteq\left(v_{1}-v\right) F\) when \(n\) is large and \(v\) is close to \(n\), and say why \(F\) would usually be preferred to \(W\).
Problem 14
In the normal straight-line regression model it is thought that a power transformation of the covariate may be needed, that is, the model $$ y=\beta_{0}+\beta_{1} x^{(\lambda)}+\varepsilon $$ may be suitable, where \(x^{(\lambda)}\) is the power transformation $$ x^{(\lambda)}= \begin{cases}\frac{x^{\lambda}-1}{\lambda}, & \lambda \neq 0 \\\ \log x, & \lambda=0\end{cases} $$ (a) Show by Taylor series expansion of \(x^{(\lambda)}\) at \(\lambda=1\) that a test for power transformation can be based on the reduction in sum of squares when the constructed variable \(x \log x\) is added to the model with linear predictor \(\beta_{0}+\beta_{1} x\). (b) Show that the profile log likelihood for \(\lambda\) is equivalent to \(\ell_{\mathrm{p}}(\lambda) \equiv-\frac{n}{2} \log \operatorname{SS}\left(\widehat{\beta}_{\lambda}\right)\), where \(S S\left(\widehat{\beta}_{\lambda}\right)\) is the residual sum of squares for regression of \(y\) on the \(n \times 2\) design matrix with a column of ones and the column consisting of the \(x_{j}^{(\lambda)}\). Why is a Jacobian for the transformation not needed in this case, unlike in Example \(8.23 ?\) (Box and Tidwell, 1962 )
Problem 16
(a) Show that AIC for a normal linear model with \(n\) responses, \(p\) covariates and unknown \(\sigma^{2}\) may be written as \(n \log \widehat{\sigma}^{2}+2 p\), where \(\widehat{\sigma}^{2}=S S_{p} / n\) is the maximum likelihood estimate of \(\sigma^{2}\). If \(\widehat{\sigma}_{0}^{2}\) is the unbiased estimate under some fixed correct model with \(q\) covariates, show that use of \(\mathrm{AIC}\) is equivalent to use of \(n \log \left\\{1+\left(\widehat{\sigma}^{2}-\widehat{\sigma}_{0}^{2}\right) / \widehat{\sigma}_{0}^{2}\right\\}+2 p\), and that this is roughly equal to \(n\left(\widehat{\sigma}^{2} / \widehat{\sigma}_{0}^{2}-1\right)+2 p .\) Deduce that model selection using \(C_{p}\) approximates that using \(\mathrm{AIC}\). (b) Show that \(C_{p}=(q-p)(F-1)+p\), where \(F\) is the \(F\) statistic for comparison of the models with \(p\) and \(q>p\) covariates, and deduce that if the model with \(p\) covariates is correct, then \(\mathrm{E}\left(C_{p}\right) \doteq q\), but that otherwise \(\mathrm{E}\left(C_{p}\right)>q\)
Problem 17
Consider the straight-line regression model \(y_{j}=\alpha+\beta x_{j}+\sigma \varepsilon_{j}, j=1, \ldots, n\). Suppose that \(\sum x_{j}=0\) and that the \(\varepsilon_{j}\) are independent with means zero, variances \(\varepsilon\), and common density \(f(\cdot)\) (a) Write down the variance of the least squares estimate of \(\beta\). (b) Show that if \(\sigma\) is known, the log likelihood for the data is $$ \ell(\alpha, \beta)=-n \log \sigma+\sum_{j=1}^{n} \log f\left(\frac{y_{j}-\alpha-\beta x_{j}}{\sigma}\right) $$ derive the expected information matrix for \(\alpha\) and \(\beta\), and show that the asymptotic variance of the maximum likelihood estimate of \(\beta\) can be written as \(\sigma^{2} /\left(i \sum x_{j}^{2}\right)\), where $$ i=\mathrm{E}\left\\{-\frac{d^{2} \log f(\varepsilon)}{d \varepsilon^{2}}\right\\} $$ Hence show that the the least squares estimate of \(\beta\) has asymptotic relative efficiency \(i / v \times 100 \%\) (c) Show that the cumulant-generating function of the Gumbel distribution, \(f(u)=\) \(\exp \\{-u-\exp (-u)\\},-\infty
Problem 18
Over a period of 90 days a study was carried out on 1500 women. Its purpose was to investigate the relation between obstetrical practices and the time spent in the delivery suite by women giving birth. One thing that greatly affects this time is whether or not a woman has previously given birth. Unfortunately this vital information was lost, giving the researchers three options: (a) abandon the study; (b) go back to the medical records and find which women had previously given birth (very time-consuming); or (c) for each day check how many women had previously given birth (relatively quick). The statistical question arising was whether (c) would recover enough information about the parameter of interest. Suppose that a linear model is appropriate for log time in delivery suite, and that the log time for a first delivery is normally distributed with mean \(\mu+\alpha\) and variance \(\sigma^{2}\), whereas for subsequent deliveries the mean time is \(\mu\). Suppose that the times for all the women are independent, and that for each there is a probability \(\pi\) that the labour is her first, independent of the others. Further suppose that the women are divided into \(k\) groups corresponding to days and that each group has size \(m\); the overall number is \(n=m k\). Under (c), show that the average log time on day \(j, Z_{j}\), is normally distributed with mean \(\mu+R_{j} \alpha / m\) and variance \(\sigma^{2} / m\), where \(R_{j}\) is binomial with probability \(\pi\) and denominator \(m\). Hence show that the overall log likelihood is $$ \ell(\mu, \alpha)=-\frac{1}{2} k \log \left(2 \pi \sigma^{2} / m\right)-\frac{m}{2 \sigma^{2}} \sum_{j=1}^{k}\left(z_{j}-\mu-r_{j} \alpha / m\right)^{2} $$ where \(z_{j}\) and \(r_{j}\) are the observed values of \(Z_{j}\) and \(R_{j}\) and we take \(\pi\) and \(\sigma^{2}\) to be known. If \(R_{j}\) has mean \(m \pi\) and variance \(m \tau^{2}\), show that the inverse expected information matrix is $$ I(\mu, \alpha)^{-1}=\frac{\sigma^{2}}{n \tau^{2}}\left(\begin{array}{cc} m \pi^{2}+\tau^{2} & -m \pi \\ -m \pi & m \end{array}\right) $$ (i) If \(m=1, \tau^{2}=\pi(1-\pi)\), and \(\pi=n_{1} / n\), where \(n=n_{0}+n_{1}\), show that \(I(\mu, \alpha)^{-1}\) equals the variance matrix for the two-sample regression model. Explain why. (ii) If \(\tau^{2}=0\), show that neither \(\mu\) nor \(\alpha\) is estimable; explain why. (iii) If \(\tau^{2}=\pi(1-\pi)\), show that \(\mu\) is not estimable when \(\pi=1\), and that \(\alpha\) is not estimable when \(\pi=0\) or \(\pi=1\). Explain why the conditions for these two parameters to be estimable differ in form. (iv) Show that the effect of grouping, \((m>1)\), is that \(\operatorname{var}(\widehat{\alpha})\) is increased by a factor \(m\) regardless of \(\pi\) and \(\sigma^{2}\) (v) It was known that \(\sigma^{2} \doteq 0.2, m \doteq 1500 / 90, \pi \doteq 0.3\). Calculate the standard error for \(\widehat{\alpha}\). It was known from other studies that first deliveries are typically 20-25\% longer than subsequent ones. Show that an effect of size \(\alpha=\log (1.25)\) would be very likely to be detected based on the grouped data, but that an effect of size \(\alpha=\log (1.20)\) would be less certain to be detected, and discuss the implications.
Problem 21
Data \(\left(x_{1}, y_{1}\right), \ldots,\left(x_{n}, y_{n}\right)\) satisfy the straight-line regression model (5.3). In a calibration problem the value \(y_{+}\)of a new response independent of the existing data has been observed, and inference is required for the unknown corresponding value \(x_{+}\)of \(x\). (a) Let \(s_{x}^{2}=\sum\left(x_{j}-\bar{x}\right)^{2}\) and let \(S^{2}\) be the unbiased estimator of the error variance \(\sigma^{2}\). Show that $$ T\left(x_{+}\right)=\frac{Y_{+}-\widehat{\gamma}_{0}-\widehat{\gamma}_{1}\left(x_{+}-\bar{x}\right)}{\left[S^{2}\left\\{1+n^{-1}+\left(x_{+}-\bar{x}\right)^{2} / s_{x}^{2}\right\\}\right]^{1 / 2}} $$ is a pivot, and explain why the set $$ \mathcal{X}_{1-2 \alpha}=\left\\{x_{+}: t_{n-2}(\alpha) \leq T\left(x_{+}\right) \leq t_{n-2}(1-\alpha)\right\\} $$ contains \(x_{+}\)with probability \(1-2 \alpha\). (b) Show that the function \(g(u)=(a+b u) /\left(c+u^{2}\right)^{1 / 2}, c>0, a, b \neq 0\), has exactly one stationary point, at \(\tilde{u}=-b c / a\), that sign \(g(\tilde{u})=\operatorname{sign} a\), that \(g(\tilde{u})\) is a local maximum if \(a>0\) and a local minimum if \(a<0\), and that \(\lim _{u \rightarrow \pm \infty} g(u)=\mp b .\) Hence sketch \(g(u)\) in the four possible cases \(a, b<0, a, b>0, a<0