Problem 5
Consider a linear smoother with \(n \times n\) smoothing matrix \(S_{h}\), so \(\widehat{g}=S_{h} y\), and show that the function \(a_{j}(u)\) giving the fitted value at \(x_{j}\) as a function of the response \(u\) there satisfies $$ a_{j}(u)= \begin{cases}\widehat{g}\left(x_{j}\right), & u=y_{j} \\\ \widehat{g}_{-j}\left(x_{j}\right), & u=\widehat{g}_{-j}\left(x_{j}\right)\end{cases} $$ Explain why this implies that \(S_{j j}(h)\left\\{y_{j}-\widehat{g}_{-j}\left(x_{j}\right)\right\\}=\widehat{g}\left(x_{j}\right)-\widehat{g}_{-j}\left(x_{j}\right)\), and hence obtain \((10.42)\)
Problem 5
The rate of growth of an epidemic such as AIDS for a large population can be estimated fairly accurately and treated as a known function \(g(t)\) of time \(t\). In a smaller area where few cases have been observed the rate is hard to estimate because data are scarce. However predictions of the numbers of future cases in such an area must be made in order to allocate resources such as hospital beds. A simple assumption is that cases in the area arise in a non- homogeneous Poisson process with rate \(\lambda g(t)\), for which the mean number of cases in period \(\left(t_{1}, t_{2}\right)\) is \(\lambda \int_{t_{1}}^{t_{2}} g(t) d t\). Suppose that \(N_{1}=n_{1}\) individuals with the disease have been observed in the period \((-\infty, 0)\), and that predictions are required for the number \(N_{2}\), of cases to be observed in a future period \(\left(t_{1}, t_{2}\right)\). (a) Find the conditional distribution of \(N_{2}\) given \(N_{1}+N_{2}\), and show it to be free of \(\lambda\). Deduce that a \((1-2 \alpha)\) prediction interval \(\left(n_{-}, n_{+}\right)\)for \(N_{2}\) is found by solving approximately the equations $$ \begin{aligned} &\alpha=\operatorname{Pr}\left(N_{2} \leq n_{-} \mid N_{1}+N_{2}=n_{1}+n_{-}\right) \\ &\alpha=\operatorname{Pr}\left(N_{2} \geq n_{+} \mid N_{1}+N_{2}=n_{1}+n_{+}\right) \end{aligned} $$ (b) Use a normal approximation to the conditional distribution in (a) to show that for moderate to large \(n_{1}, n_{-}\)and \(n_{+}\)are the solutions to the quadratic equation $$ (1-p)^{2} n^{2}+p(p-1)\left(2 n_{1}+z_{\alpha}^{2}\right) n+n_{1} p\left\\{n_{1} p-(1-p) z_{\alpha}^{2}\right\\}=0 $$ where \(\Phi\left(z_{\alpha}\right)=\alpha\) and $$ p=\int_{t_{1}}^{t_{2}} g(t) d t /\left\\{\int_{t_{1}}^{t_{2}} g(t) d t+\int_{-\infty}^{0} g(t) d t\right\\} $$ (c) Find approximate \(0.90\) prediction intervals for the special case where \(g(t)=2^{t / 2}\), so that the doubling time for the epidemic is two years, \(n_{1}=10\) cases have been observed until time 0 , and \(t_{1}=0, t_{2}=1\) (next year) (Cox and Davison, 1989).
Problem 6
The rate of growth of an epidemic such as AIDS for a large population can be estimated fairly accurately and treated as a known function \(g(t)\) of time \(t\). In a smaller area where few cases have been observed the rate is hard to estimate because data are scarce. However predictions of the numbers of future cases in such an area must be made in order to allocate resources such as hospital beds. A simple assumption is that cases in the area arise in a non- homogeneous Poisson process with rate \(\lambda g(t)\), for which the mean number of cases in period \(\left(t_{1}, t_{2}\right)\) is \(\lambda \int_{t_{1}}^{t_{2}} g(t) d t\). Suppose that \(N_{1}=n_{1}\) individuals with the disease have been observed in the period \((-\infty, 0)\), and that predictions are required for the number \(N_{2}\), of cases to be observed in a future period \(\left(t_{1}, t_{2}\right)\). (a) Find the conditional distribution of \(N_{2}\) given \(N_{1}+N_{2}\), and show it to be free of \(\lambda\). Deduce that a \((1-2 \alpha)\) prediction interval \(\left(n_{-}, n_{+}\right)\)for \(N_{2}\) is found by solving approximately the equations $$ \begin{aligned} &\alpha=\operatorname{Pr}\left(N_{2} \leq n_{-} \mid N_{1}+N_{2}=n_{1}+n_{-}\right) \\ &\alpha=\operatorname{Pr}\left(N_{2} \geq n_{+} \mid N_{1}+N_{2}=n_{1}+n_{+}\right) \end{aligned} $$ (b) Use a normal approximation to the conditional distribution in (a) to show that for moderate to large \(n_{1}, n_{-}\)and \(n_{+}\)are the solutions to the quadratic equation $$ (1-p)^{2} n^{2}+p(p-1)\left(2 n_{1}+z_{\alpha}^{2}\right) n+n_{1} p\left\\{n_{1} p-(1-p) z_{\alpha}^{2}\right\\}=0 $$ where \(\Phi\left(z_{\alpha}\right)=\alpha\) and $$ p=\int_{t_{1}}^{t_{2}} g(t) d t /\left\\{\int_{t_{1}}^{t_{2}} g(t) d t+\int_{-\infty}^{0} g(t) d t\right\\} $$ (c) Find approximate \(0.90\) prediction intervals for the special case where \(g(t)=2^{t / 2}\), so that the doubling time for the epidemic is two years, \(n_{1}=10\) cases have been observed until time 0 , and \(t_{1}=0, t_{2}=1\) (next year) (Cox and Davison, 1989). (d) Show that conditional on \(A, R_{1}\) has a generalized linear model density with $$ b(\theta)=\log \left\\{\sum_{u=u-}^{u_{+}}\left(\begin{array}{c} m_{1} \\ u \end{array}\right)\left(\begin{array}{c} m_{0} \\ a-u \end{array}\right) e^{u \theta}\right\\}, u_{-}=\max \left\\{0, a-m_{0}\right\\}, u_{+}=\min \left\\{m_{1}, a\right\\} $$ Deduce that a score test of \(\Delta=1\) based on data from \(n\) independent \(2 \times 2\) tables \(\left(R_{0 j}, m_{0 j}-R_{0 j} ; R_{1 j}, m_{1 j}-R_{1 j}\right)\) is obtained by treating \(\sum R_{1 j}\) as approximately normal with mean and variance $$ \sum_{j=1}^{n} \frac{m_{1 j} a_{j}}{m_{0 j}+m_{1 j}}, \quad \sum_{j=1}^{n} \frac{m_{0 j} m_{1 j} a_{j}\left(m_{0 j}+m_{0 j}-a_{j}\right)}{\left(m_{0 j}+m_{1 j}\right)^{2}\left(m_{0 j}+m_{1 j}-1\right)} $$ when continuity-corrected this is the Mantel-Haenszel test. (Mantel and Haenszel, 1959 )
Problem 6
In \((10.17)\), suppose that \(\phi_{j}=\phi a_{j}\), where the \(a_{j}\) are known constants, and that \(\phi\) is functionally independent of \(\beta .\) Show that the likelihood equations for \(\beta\) are independent of \(\phi\), and deduce that the profile log likelihood for \(\phi\) is $$ \ell_{\mathrm{p}}(\phi)=\phi^{-1} \sum_{j=1}^{n}\left\\{\frac{y_{j} \widehat{\theta}_{j}-b\left(\widehat{\theta}_{j}\right)}{a_{j}}+c\left(y_{j} ; \phi a_{j}\right)\right\\} $$ Hence show that for gamma data the maximum likelihood estimate of \(v\) solves the equation \(\left.\log \nu-\psi(v)=n^{-1} \sum_{(} z_{j}-\log z_{j}-1\right)\), where \(z_{j}=y_{j} / \widehat{\mu}_{j}\) and \(\psi(v)\) is the digamma function \(d \log \Gamma(v) / d \nu\)
Problem 7
Develop the details of local likelihood smoothing when a linear polynomial is fitted to Poisson data, using link function \(\log \mu=\beta_{0}+\beta_{1}\left(x-x_{0}\right)\).
Problem 7
Suppose that the cumulant-generating function of \(X\) can be written in the form \(m\\{b(\theta+\) \(t)-b(\theta)\\}\). Let \(\mathrm{E}(X)=\mu=m b^{\prime}(\theta)\) and let \(\kappa_{2}(\mu)\) and \(\kappa_{3}(\mu)\) be the variance and third cumulant respectively of \(X\), expressed in terms of \(\mu ; \kappa_{2}(\mu)\) is the variance function \(V(\mu)\). (a) Show that $$ \kappa_{3}(\mu)=\kappa_{2}(\mu) \kappa_{2}^{\prime}(\mu) \quad \text { and } \quad \frac{\kappa_{3}}{\kappa_{2}^{2}}=\frac{d}{d \mu} \log \kappa_{2}(\mu) $$ Verify that the binomial cumulants have this form with \(b(\theta)=\log \left(1+e^{\theta}\right)\). (b) Show that if the derivatives of \(b(\theta)\) are all \(O(1)\), then \(Y=g(X)\) is approximately symmetrically distributed if \(g\) satisfies the second-order differential equation $$ 3 \kappa_{2}^{2}(\mu) g^{\prime \prime}(\mu)+g^{\prime}(\mu) \kappa_{3}(\mu)=0 $$ Show that if \(\kappa_{2}(\mu)\) and \(\kappa_{3}(\mu)\) are related as in (a), then $$ g(x)=\int^{x} \kappa_{2}^{-1 / 3}(\mu) d \mu $$ (c) Hence find symmetrizing transformations for Poisson and binomial variables. (McCullagh and Nelder, 1989 , Section 4.8)
Problem 9
At each of the doses \(x_{1}
Problem 11
Let \(Y\) be binomial with probability \(\pi=e^{\lambda} /\left(1+e^{\lambda}\right)\) and denominator \(m\). (a) Show that \(m-Y\) is binomial with \(\lambda^{\prime}=-\lambda\). Consider $$ \tilde{\lambda}=\log \left(\frac{Y+c_{1}}{m-Y+c_{2}}\right) $$ as an estimator of \(\lambda\). Show that in order to achieve consistency under the transformation \(Y \rightarrow m-Y\), we must have \(c_{1}=c_{2}\) (b) Write \(Y=m \pi+\sqrt{m \pi(1-\pi)} Z\), where \(Z=O_{p}(1)\) for large \(m\). Show that $$ \mathrm{E}\\{\log (Y+c)\\}=\log (m \pi)+\frac{c}{m \pi}-\frac{1-\pi}{2 m \pi}+O\left(m^{-3 / 2}\right) $$ Find the corresponding expansion for \(\mathrm{E}\\{\log (m-Y+c)\\}\), and with \(c_{1}=c_{2}=c\) find the value of \(c\) for which \(\tilde{\lambda}\) is unbiased for \(\lambda\) to order \(m^{-1}\). What is the connection to the empirical logistic transform? (Cox, 1970, Section 3.2)
Problem 17
Consider independent exponential variables \(Y_{j}\) with densities \(\lambda_{j}
\exp \left(-\lambda_{j} y_{j}\right)\), where \(\lambda_{j}=\) \(\exp
\left(\beta_{0}+\beta_{1} x_{j}\right), j=1, \ldots, n\), where \(x_{j}\) is
scalar and \(\sum x_{j}=0\) without loss of generality.
(a) Find the expected information for \(\beta_{0}, \beta_{1}\) and show that the
maximum likelihood estimator \(\widehat{\beta}_{1}\) has asymptotic variance
\(\left(n m_{2}\right)^{-1}\), where \(m_{2}=n^{-1} \sum x_{j}^{2}\)
(b) Under no censoring, show that the partial log likelihood for \(\beta_{1}\)
equals
$$
-\sum_{j=1}^{n} \log \left\\{\sum_{i=j}^{n} \exp \left(\beta_{1} x_{(i)}\right)\right\\}
$$
where the elements of the rank statistic \(R=\\{(1), \ldots,(n)\\}\) are
determined by the ordering on the failure times, \(y_{(1)}<\cdots
Problem 18
Suppose that \(n\) independent Poisson processes of rates \(\lambda_{j}(y)\) are
observed simultaneously, and that the \(m\) events occur at
\(0