Free solutions & answers for Statistical Models Chapter 10 - (Page 2) [step by step]

Problem 5

Consider a linear smoother with $n \times n$ smoothing matrix $S_{h}$, so $\widehat{g}=S_{h} y$, and show that the function $a_{j}(u)$ giving the fitted value at $x_{j}$ as a function of the response $u$ there satisfies $$ a_{j}(u)= \begin{cases}\widehat{g}\left(x_{j}\right), & u=y_{j} \\\ \widehat{g}_{-j}\left(x_{j}\right), & u=\widehat{g}_{-j}\left(x_{j}\right)\end{cases} $$ Explain why this implies that $S_{j j}(h)\left\\{y_{j}-\widehat{g}_{-j}\left(x_{j}\right)\right\\}=\widehat{g}\left(x_{j}\right)-\widehat{g}_{-j}\left(x_{j}\right)$, and hence obtain $(10.42)$

Problem 5

The rate of growth of an epidemic such as AIDS for a large population can be estimated fairly accurately and treated as a known function $g(t)$ of time $t$. In a smaller area where few cases have been observed the rate is hard to estimate because data are scarce. However predictions of the numbers of future cases in such an area must be made in order to allocate resources such as hospital beds. A simple assumption is that cases in the area arise in a non- homogeneous Poisson process with rate $\lambda g(t)$, for which the mean number of cases in period $\left(t_{1}, t_{2}\right)$ is $\lambda \int_{t_{1}}^{t_{2}} g(t) d t$. Suppose that $N_{1}=n_{1}$ individuals with the disease have been observed in the period $(-\infty, 0)$, and that predictions are required for the number $N_{2}$, of cases to be observed in a future period $\left(t_{1}, t_{2}\right)$. (a) Find the conditional distribution of $N_{2}$ given $N_{1}+N_{2}$, and show it to be free of $\lambda$. Deduce that a $(1-2 \alpha)$ prediction interval $\left(n_{-}, n_{+}\right)$for $N_{2}$ is found by solving approximately the equations $$ \begin{aligned} &\alpha=\operatorname{Pr}\left(N_{2} \leq n_{-} \mid N_{1}+N_{2}=n_{1}+n_{-}\right) \\ &\alpha=\operatorname{Pr}\left(N_{2} \geq n_{+} \mid N_{1}+N_{2}=n_{1}+n_{+}\right) \end{aligned} $$ (b) Use a normal approximation to the conditional distribution in (a) to show that for moderate to large $n_{1}, n_{-}$and $n_{+}$are the solutions to the quadratic equation $$ (1-p)^{2} n^{2}+p(p-1)\left(2 n_{1}+z_{\alpha}^{2}\right) n+n_{1} p\left\\{n_{1} p-(1-p) z_{\alpha}^{2}\right\\}=0 $$ where $\Phi\left(z_{\alpha}\right)=\alpha$ and $$ p=\int_{t_{1}}^{t_{2}} g(t) d t /\left\\{\int_{t_{1}}^{t_{2}} g(t) d t+\int_{-\infty}^{0} g(t) d t\right\\} $$ (c) Find approximate $0.90$ prediction intervals for the special case where $g(t)=2^{t / 2}$, so that the doubling time for the epidemic is two years, $n_{1}=10$ cases have been observed until time 0 , and $t_{1}=0, t_{2}=1$ (next year) (Cox and Davison, 1989).

Problem 6

The rate of growth of an epidemic such as AIDS for a large population can be estimated fairly accurately and treated as a known function $g(t)$ of time $t$. In a smaller area where few cases have been observed the rate is hard to estimate because data are scarce. However predictions of the numbers of future cases in such an area must be made in order to allocate resources such as hospital beds. A simple assumption is that cases in the area arise in a non- homogeneous Poisson process with rate $\lambda g(t)$, for which the mean number of cases in period $\left(t_{1}, t_{2}\right)$ is $\lambda \int_{t_{1}}^{t_{2}} g(t) d t$. Suppose that $N_{1}=n_{1}$ individuals with the disease have been observed in the period $(-\infty, 0)$, and that predictions are required for the number $N_{2}$, of cases to be observed in a future period $\left(t_{1}, t_{2}\right)$. (a) Find the conditional distribution of $N_{2}$ given $N_{1}+N_{2}$, and show it to be free of $\lambda$. Deduce that a $(1-2 \alpha)$ prediction interval $\left(n_{-}, n_{+}\right)$for $N_{2}$ is found by solving approximately the equations $$ \begin{aligned} &\alpha=\operatorname{Pr}\left(N_{2} \leq n_{-} \mid N_{1}+N_{2}=n_{1}+n_{-}\right) \\ &\alpha=\operatorname{Pr}\left(N_{2} \geq n_{+} \mid N_{1}+N_{2}=n_{1}+n_{+}\right) \end{aligned} $$ (b) Use a normal approximation to the conditional distribution in (a) to show that for moderate to large $n_{1}, n_{-}$and $n_{+}$are the solutions to the quadratic equation $$ (1-p)^{2} n^{2}+p(p-1)\left(2 n_{1}+z_{\alpha}^{2}\right) n+n_{1} p\left\\{n_{1} p-(1-p) z_{\alpha}^{2}\right\\}=0 $$ where $\Phi\left(z_{\alpha}\right)=\alpha$ and $$ p=\int_{t_{1}}^{t_{2}} g(t) d t /\left\\{\int_{t_{1}}^{t_{2}} g(t) d t+\int_{-\infty}^{0} g(t) d t\right\\} $$ (c) Find approximate $0.90$ prediction intervals for the special case where $g(t)=2^{t / 2}$, so that the doubling time for the epidemic is two years, $n_{1}=10$ cases have been observed until time 0 , and $t_{1}=0, t_{2}=1$ (next year) (Cox and Davison, 1989). (d) Show that conditional on $A, R_{1}$ has a generalized linear model density with $$ b(\theta)=\log \left\\{\sum_{u=u-}^{u_{+}}\left(\begin{array}{c} m_{1} \\ u \end{array}\right)\left(\begin{array}{c} m_{0} \\ a-u \end{array}\right) e^{u \theta}\right\\}, u_{-}=\max \left\\{0, a-m_{0}\right\\}, u_{+}=\min \left\\{m_{1}, a\right\\} $$ Deduce that a score test of $\Delta=1$ based on data from $n$ independent $2 \times 2$ tables $\left(R_{0 j}, m_{0 j}-R_{0 j} ; R_{1 j}, m_{1 j}-R_{1 j}\right)$ is obtained by treating $\sum R_{1 j}$ as approximately normal with mean and variance $$ \sum_{j=1}^{n} \frac{m_{1 j} a_{j}}{m_{0 j}+m_{1 j}}, \quad \sum_{j=1}^{n} \frac{m_{0 j} m_{1 j} a_{j}\left(m_{0 j}+m_{0 j}-a_{j}\right)}{\left(m_{0 j}+m_{1 j}\right)^{2}\left(m_{0 j}+m_{1 j}-1\right)} $$ when continuity-corrected this is the Mantel-Haenszel test. (Mantel and Haenszel, 1959 )

Problem 6

In $(10.17)$, suppose that $\phi_{j}=\phi a_{j}$, where the $a_{j}$ are known constants, and that $\phi$ is functionally independent of $\beta .$ Show that the likelihood equations for $\beta$ are independent of $\phi$, and deduce that the profile log likelihood for $\phi$ is $$ \ell_{\mathrm{p}}(\phi)=\phi^{-1} \sum_{j=1}^{n}\left\\{\frac{y_{j} \widehat{\theta}_{j}-b\left(\widehat{\theta}_{j}\right)}{a_{j}}+c\left(y_{j} ; \phi a_{j}\right)\right\\} $$ Hence show that for gamma data the maximum likelihood estimate of $v$ solves the equation $\left.\log \nu-\psi(v)=n^{-1} \sum_{(} z_{j}-\log z_{j}-1\right)$, where $z_{j}=y_{j} / \widehat{\mu}_{j}$ and $\psi(v)$ is the digamma function $d \log \Gamma(v) / d \nu$

Problem 7

Develop the details of local likelihood smoothing when a linear polynomial is fitted to Poisson data, using link function $\log \mu=\beta_{0}+\beta_{1}\left(x-x_{0}\right)$.

Problem 7

Suppose that the cumulant-generating function of $X$ can be written in the form $m\\{b(\theta+$ $t)-b(\theta)\\}$. Let $\mathrm{E}(X)=\mu=m b^{\prime}(\theta)$ and let $\kappa_{2}(\mu)$ and $\kappa_{3}(\mu)$ be the variance and third cumulant respectively of $X$, expressed in terms of $\mu ; \kappa_{2}(\mu)$ is the variance function $V(\mu)$. (a) Show that $$ \kappa_{3}(\mu)=\kappa_{2}(\mu) \kappa_{2}^{\prime}(\mu) \quad \text { and } \quad \frac{\kappa_{3}}{\kappa_{2}^{2}}=\frac{d}{d \mu} \log \kappa_{2}(\mu) $$ Verify that the binomial cumulants have this form with $b(\theta)=\log \left(1+e^{\theta}\right)$. (b) Show that if the derivatives of $b(\theta)$ are all $O(1)$, then $Y=g(X)$ is approximately symmetrically distributed if $g$ satisfies the second-order differential equation $$ 3 \kappa_{2}^{2}(\mu) g^{\prime \prime}(\mu)+g^{\prime}(\mu) \kappa_{3}(\mu)=0 $$ Show that if $\kappa_{2}(\mu)$ and $\kappa_{3}(\mu)$ are related as in (a), then $$ g(x)=\int^{x} \kappa_{2}^{-1 / 3}(\mu) d \mu $$ (c) Hence find symmetrizing transformations for Poisson and binomial variables. (McCullagh and Nelder, 1989 , Section 4.8)

Problem 9

At each of the doses $x_{1}0$ is used, show that $$ \widehat{\beta}=\frac{1}{x_{0}} \Phi^{-1}(r / m), \quad \operatorname{var}(\widehat{\beta}) \doteq \frac{\Phi\left(\beta x_{0}\right)\left\\{1-\Phi\left(\beta x_{0}\right)\right\\}}{m x_{0}^{2}\left\\{\phi\left(\beta x_{0}\right)\right\\}^{2}} $$ where $\phi$ and $\Phi$ are the standard normal density and distribution functions. Plot the function $\Phi(\eta)\\{1-\Phi(\eta)\\} / \phi(\eta)^{2}$ for $\eta$ in the range $-3 \leq \eta \leq 3$, and comment on the implications for the choice of $x_{0}$ if there is some prior knowledge of the likely value of $\beta$.

Problem 11

Let $Y$ be binomial with probability $\pi=e^{\lambda} /\left(1+e^{\lambda}\right)$ and denominator $m$. (a) Show that $m-Y$ is binomial with $\lambda^{\prime}=-\lambda$. Consider $$ \tilde{\lambda}=\log \left(\frac{Y+c_{1}}{m-Y+c_{2}}\right) $$ as an estimator of $\lambda$. Show that in order to achieve consistency under the transformation $Y \rightarrow m-Y$, we must have $c_{1}=c_{2}$ (b) Write $Y=m \pi+\sqrt{m \pi(1-\pi)} Z$, where $Z=O_{p}(1)$ for large $m$. Show that $$ \mathrm{E}\\{\log (Y+c)\\}=\log (m \pi)+\frac{c}{m \pi}-\frac{1-\pi}{2 m \pi}+O\left(m^{-3 / 2}\right) $$ Find the corresponding expansion for $\mathrm{E}\\{\log (m-Y+c)\\}$, and with $c_{1}=c_{2}=c$ find the value of $c$ for which $\tilde{\lambda}$ is unbiased for $\lambda$ to order $m^{-1}$. What is the connection to the empirical logistic transform? (Cox, 1970, Section 3.2)

Problem 17

Consider independent exponential variables $Y_{j}$ with densities $\lambda_{j} \exp \left(-\lambda_{j} y_{j}\right)$, where $\lambda_{j}=$ $\exp \left(\beta_{0}+\beta_{1} x_{j}\right), j=1, \ldots, n$, where $x_{j}$ is scalar and $\sum x_{j}=0$ without loss of generality. (a) Find the expected information for $\beta_{0}, \beta_{1}$ and show that the maximum likelihood estimator $\widehat{\beta}_{1}$ has asymptotic variance $\left(n m_{2}\right)^{-1}$, where $m_{2}=n^{-1} \sum x_{j}^{2}$ (b) Under no censoring, show that the partial log likelihood for $\beta_{1}$ equals $$ -\sum_{j=1}^{n} \log \left\\{\sum_{i=j}^{n} \exp \left(\beta_{1} x_{(i)}\right)\right\\} $$ where the elements of the rank statistic $R=\\{(1), \ldots,(n)\\}$ are determined by the ordering on the failure times, \(y_{(1)}<\cdots

Problem 18

Suppose that $n$ independent Poisson processes of rates $\lambda_{j}(y)$ are observed simultaneously, and that the $m$ events occur at $0c_{j}$. If $\mathcal{R}_{i}$ is the set $\left\\{j: V_{j}\left(y_{i}\right)=1\right\\}$, show that the second term in (10.67) equals $$ \prod_{i=1}^{m} \frac{\xi\left\\{\beta ; x_{j_{i}}\left(y_{i}\right)\right\\}}{\sum_{j \in \mathcal{R}_{i}} \xi\left\\{\beta ; x_{j}\left(y_{i}\right)\right\\}} $$ How does this specialize for time-varying explanatory variables in the proportional hazards model?

Chapter 10

Access millions of textbook solutions in one place

Recommended explanations on Math Textbooks

Statistics

Applied Mathematics

Logic and Functions

Calculus

Geometry

Theoretical and Mathematical Physics

Company

Product

Help