Problem 1
Write down the linear model corresponding to a simple random sample \(y_{1}, \ldots, y_{n}\) from the \(N\left(\mu, \sigma^{2}\right)\) distribution, and find the design matrix. Verify that $$ \widehat{\mu}=\left(X^{\mathrm{T}} X\right)^{-1} X^{\mathrm{T}} y=\bar{y}, \quad s^{2}=S S(\widehat{\beta}) /(n-p)=(n-1)^{-1} \sum\left(y_{j}-\bar{y}\right)^{2} $$
Problem 2
Suppose that random variables \(Y_{g j}, j=1, \ldots, n_{g}, g=1, \ldots, G\), are independent and that they satisfy the normal linear model \(Y_{g j}=x_{g}^{\mathrm{T}} \beta+\varepsilon_{g j}\). Write down the covariate matrix for this model, and show that the least squares estimates can be written as \(\left(X_{1}^{\mathrm{T}} W X_{1}\right)^{-1} X_{1}^{\mathrm{T}} W Z\), where \(W=\operatorname{diag}\left\\{n_{1}, \ldots, n_{G}\right\\}\), and the \(g\) th element of \(Z\) is \(n_{g}^{-1} \sum_{j} Y_{g j} .\) Hence show that weighted least squares based on \(Z\) and unweighted least squares based on \(Y\) give the same parameter estimates and confidence intervals, when \(\sigma^{2}\) is known. Why do they differ if \(\sigma^{2}\) is unknown, unless \(n_{g} \equiv 1 ?\) Discuss how the residuals for the two setups differ, and say which is preferable for modelchecking.
Problem 2
(a) Let \(A, B, C\), and \(D\) represent \(p \times p, p \times q, q \times q\), and \(q \times p\) matrices respectively. Show that provided that the necessary inverses exist $$ (A+B C D)^{-1}=A^{-1}-A^{-1} B\left(C^{-1}+D A^{-1} B\right)^{-1} D A^{-1} $$ (b) If the matrix \(A\) is partitioned as $$ A=\left(\begin{array}{ll} A_{11} & A_{12} \\ A_{21} & A_{22} \end{array}\right) $$ and the necessary inverses exist, show that the elements of the corresponding partition of \(A^{-1}\) are $$ \begin{aligned} A^{11} &=\left(A_{11}-A_{12} A_{22}^{-1} A_{21}\right)^{-1}, \quad A^{22}=\left(A_{22}-A_{21} A_{11}^{-1} A_{12}\right)^{-1} \\ A^{12} &=-A_{11}^{-1} A_{12} A^{22}, \quad A^{21}=-A_{22}^{-1} A_{21} A^{11}. \end{aligned} $$
Problem 3
The angles of the triangle \(\mathrm{ABC}\) are measured with \(\mathrm{A}\) and \(\mathrm{B}\) each measured twice and \(\mathrm{C}\) three times. All the measurements are independent and unbiased with common variance \(\sigma^{2}\). Find the least squares estimates of the angles \(\mathrm{A}\) and \(\mathrm{B}\) based on the seven measurements and calculate the variance of these estimates.
Problem 3
Consider a linear regression model (8.1) in which the errors \(\varepsilon_{j}\) are independently distributed with Laplace density $$ f(u ; \sigma)=\left(2^{3 / 2} \sigma\right)^{-1} \exp \left\\{-\left|u /\left(2^{1 / 2} \sigma\right)\right|\right\\}, \quad-\infty0. $$ Verify that this density has variance \(\sigma^{2} .\) Show that the maximum likelihood estimate of \(\beta\) is obtained by minimizing the \(L^{1}\) norm \(\sum\left|y_{j}-x_{j}^{\mathrm{T}} \beta\right|\) of \(y-X \beta\). Show that if in fact the \(\varepsilon_{j} \stackrel{\text { iid }}{\sim} N\left(0, \sigma^{2}\right)\), the asymptotic relative efficiency of the estimators relative to least squares estimators is \(2 / \pi\).
Problem 4
Consider a linear model \(y_{j}=x_{j} \beta+\varepsilon_{j}, j=1, \ldots, n\) in which the \(\varepsilon_{j}\) are uncorrelated and have means zero. Find the minimum variance linear unbiased estimators of the scalar \(\beta\) when (i) \(\operatorname{var}\left(\varepsilon_{j}\right)=x_{j} \sigma^{2}\), and (ii) \(\operatorname{var}\left(\varepsilon_{j}\right)=x_{j}^{2} \sigma^{2}\). Generalize your results to the situation where \(\operatorname{var}(\varepsilon)=\sigma^{2} / w_{j}\), where the weights \(w_{j}\) are known but \(\sigma^{2}\) is not.
Problem 5
The usual linear model \(y=X \beta+\varepsilon\) is thought to apply to a set of data, and it is assumed that the \(\varepsilon_{j}\) are independent with means zero and variances \(\sigma^{2}\), so that the data are summarized in terms of the usual least squares estimates and estimate of \(\sigma^{2}, \widehat{\beta}\) and \(S^{2}\). Unknown to the unfortunate investigator, in fact \(\operatorname{var}\left(\varepsilon_{j}\right)=v_{j} \sigma^{2}\), and \(v_{1}, \ldots, v_{n}\) are unequal. Show that \(\widehat{\beta}\) remains unbiased for \(\beta\) and find its actual covariance matrix.
Problem 6
Suppose that the straight-line regression model \(y=\beta_{0}+\beta_{1} x+\varepsilon\) is fitted to data in which \(x_{1}=\cdots=x_{n-1}=-a\) and \(x_{n}=(n-1) a\), for some positive \(a .\) Show that although \(y_{n}\) completely determines the estimate of \(\beta_{1}, C_{n}=0 .\) Is Cook's distance an effective measure of influence in this situation?
Problem 8
Consider a normal linear regression \(y=\beta_{0}+\beta_{1} x+\varepsilon\) in which the parameter of interest is \(\psi=\beta_{0} / \beta_{1}\), to be estimated by \(\widehat{\psi}=\widehat{\beta}_{0} / \widehat{\beta}_{1} ;\) let \(\operatorname{var}\left(\widehat{\beta}_{0}\right)=\sigma^{2} v_{00}, \operatorname{cov}\left(\widehat{\beta}_{0}, \widehat{\beta}_{1}\right)=\sigma^{2} v_{01}\) and \(\operatorname{var}\left(\widehat{\beta}_{1}\right)=\sigma^{2} v_{11}\) (a) Show that $$ \frac{\widehat{\beta}_{0}-\psi \widehat{\beta}_{1}}{\left\\{s^{2}\left(v_{00}-2 \psi v_{01}+\psi^{2} v_{11}\right)\right\\}^{1 / 2}} \sim t_{n-p} $$ and hence deduce that a \((1-2 \alpha)\) confidence interval for \(\psi\) is the set of values of \(\psi\) satisfying the inequality $$ \widehat{\beta}_{0}^{2}-s^{2} t_{n-p}^{2}(\alpha) v_{00}+2 \psi\left\\{s^{2} t_{n-p}^{2}(\alpha) v_{01}-\beta_{0} \beta_{1}\right\\}+\psi^{2}\left\\{\widehat{\beta}_{1}^{2}-s^{2} t_{n-p}^{2}(\alpha) v_{11}\right\\} \leq 0 $$ How would this change if the value of \(\sigma\) was known? (b) By considering the coefficients on the left-hand-side of the inequality in (a), show that the confidence set can be empty, a finite interval, semi- infinite intervals stretching to \(\pm \infty\), the entire real line, two disjoint semi-infinite intervals - six possibilities in all. In each case illustrate how the set could arise by sketching a set of data that might have given rise to it. (c) A government Department of Fisheries needed to estimate how many of a certain species of fish there were in the sea, in order to know whether to continue to license commercial fishing. Each year an extensive sampling exercise was based on the numbers of fish caught, and this resulted in three numbers, \(y, x\), and a standard deviation for \(y, \sigma\). A simple model of fish population dynamics suggested that \(y=\beta_{0}+\beta_{1} x+\varepsilon\), where the errors \(\varepsilon\) are independent, and the original population size was \(\psi=\beta_{0} / \beta_{1}\). To simplify the calculations, suppose that in each year \(\sigma\) equalled 25 . If the values of \(y\) and \(x\) had been \(\begin{array}{cccccc}y: & 160 & 150 & 100 & 80 & 100 \\ x: & 140 & 170 & 200 & 230 & 260\end{array}\) after five years, give a \(95 \%\) confidence interval for \(\psi\). Do you find it plausible that \(\sigma=25\) ? If not, give an appropriate interval for \(\psi\).
Problem 9
Over a period of \(2 m+1\) years the quarterly gas consumption of a particular household may be represented by the model $$ Y_{i j}=\beta_{i}+\gamma j+\varepsilon_{i j}, \quad i=1, \ldots, 4, j=-m,-m+1, \ldots, m-1, m $$ where the parameters \(\beta_{i}\) and \(\gamma\) are unknown, and \(\varepsilon_{i j} \stackrel{\text { iid }}{\sim} N\left(0, \sigma^{2}\right) .\) Find the least squares estimators and show that they are independent with variances \((2 m+1)^{-1} \sigma^{2}\) and \(\sigma^{2} /\left(8 \sum_{i=1}^{m} i^{2}\right)\) Show also that $$ (8 m-1)^{-1}\left[\sum_{i=1}^{4} \sum_{j=-m}^{m} Y_{i j}^{2}-(2 m+1) \sum_{i=1}^{4} \bar{Y}_{i}^{2}-\frac{2 \sum_{j=-m}^{m} j \bar{Y}_{. j}^{2}}{\sum_{i=1}^{m} i^{2}}\right] $$ is unbiased for \(\sigma^{2}\), where \(\bar{Y}_{i}=(2 m+1)^{-1} \sum_{j=-m}^{m} Y_{i j}\) and \(\bar{Y}_{. j}=\frac{1}{4} \sum_{i=1}^{4} Y_{i j}\).