Warning: foreach() argument must be of type array|object, bool given in /var/www/html/web/app/themes/studypress-core-theme/template-parts/header/mobile-offcanvas.php on line 20

The rate of growth of an epidemic such as AIDS for a large population can be estimated fairly accurately and treated as a known function \(g(t)\) of time \(t\). In a smaller area where few cases have been observed the rate is hard to estimate because data are scarce. However predictions of the numbers of future cases in such an area must be made in order to allocate resources such as hospital beds. A simple assumption is that cases in the area arise in a non- homogeneous Poisson process with rate \(\lambda g(t)\), for which the mean number of cases in period \(\left(t_{1}, t_{2}\right)\) is \(\lambda \int_{t_{1}}^{t_{2}} g(t) d t\). Suppose that \(N_{1}=n_{1}\) individuals with the disease have been observed in the period \((-\infty, 0)\), and that predictions are required for the number \(N_{2}\), of cases to be observed in a future period \(\left(t_{1}, t_{2}\right)\). (a) Find the conditional distribution of \(N_{2}\) given \(N_{1}+N_{2}\), and show it to be free of \(\lambda\). Deduce that a \((1-2 \alpha)\) prediction interval \(\left(n_{-}, n_{+}\right)\)for \(N_{2}\) is found by solving approximately the equations $$ \begin{aligned} &\alpha=\operatorname{Pr}\left(N_{2} \leq n_{-} \mid N_{1}+N_{2}=n_{1}+n_{-}\right) \\ &\alpha=\operatorname{Pr}\left(N_{2} \geq n_{+} \mid N_{1}+N_{2}=n_{1}+n_{+}\right) \end{aligned} $$ (b) Use a normal approximation to the conditional distribution in (a) to show that for moderate to large \(n_{1}, n_{-}\)and \(n_{+}\)are the solutions to the quadratic equation $$ (1-p)^{2} n^{2}+p(p-1)\left(2 n_{1}+z_{\alpha}^{2}\right) n+n_{1} p\left\\{n_{1} p-(1-p) z_{\alpha}^{2}\right\\}=0 $$ where \(\Phi\left(z_{\alpha}\right)=\alpha\) and $$ p=\int_{t_{1}}^{t_{2}} g(t) d t /\left\\{\int_{t_{1}}^{t_{2}} g(t) d t+\int_{-\infty}^{0} g(t) d t\right\\} $$ (c) Find approximate \(0.90\) prediction intervals for the special case where \(g(t)=2^{t / 2}\), so that the doubling time for the epidemic is two years, \(n_{1}=10\) cases have been observed until time 0 , and \(t_{1}=0, t_{2}=1\) (next year) (Cox and Davison, 1989). (d) Show that conditional on \(A, R_{1}\) has a generalized linear model density with $$ b(\theta)=\log \left\\{\sum_{u=u-}^{u_{+}}\left(\begin{array}{c} m_{1} \\ u \end{array}\right)\left(\begin{array}{c} m_{0} \\ a-u \end{array}\right) e^{u \theta}\right\\}, u_{-}=\max \left\\{0, a-m_{0}\right\\}, u_{+}=\min \left\\{m_{1}, a\right\\} $$ Deduce that a score test of \(\Delta=1\) based on data from \(n\) independent \(2 \times 2\) tables \(\left(R_{0 j}, m_{0 j}-R_{0 j} ; R_{1 j}, m_{1 j}-R_{1 j}\right)\) is obtained by treating \(\sum R_{1 j}\) as approximately normal with mean and variance $$ \sum_{j=1}^{n} \frac{m_{1 j} a_{j}}{m_{0 j}+m_{1 j}}, \quad \sum_{j=1}^{n} \frac{m_{0 j} m_{1 j} a_{j}\left(m_{0 j}+m_{0 j}-a_{j}\right)}{\left(m_{0 j}+m_{1 j}\right)^{2}\left(m_{0 j}+m_{1 j}-1\right)} $$ when continuity-corrected this is the Mantel-Haenszel test. (Mantel and Haenszel, 1959 )

Short Answer

Expert verified
For part (a), use a binomial distribution for \(N_2 | N_1 + N_2\) free of \(\lambda\). For part (b), solve the quadratic with normal approximation for prediction intervals. In part (c), simplify integrals of \(g(t) = 2^{t/2}\) to determine \(p\) and find intervals.

Step by step solution

01

Understand the parameters and assumptions

The problem involves a non-homogeneous Poisson process with rate \(\lambda g(t)\). The mean number of cases in the period \((t_1, t_2)\) is given by \(\lambda \int_{t_1}^{t_2} g(t) \, dt\). We assume we have observed \(n_1\) cases until time 0, and aim to predict the number of future cases \(N_2\) in \((t_1, t_2)\).
02

Find the conditional distribution of \(N_2\) given \(N_1 + N_2\)

To determine the conditional distribution, note that \(N_1\) and \(N_2\) follow Poisson distributions with parameters \(\lambda \int_{-\infty}^{0} g(t)\, dt\) and \(\lambda \int_{t_1}^{t_2} g(t)\, dt\) respectively. Given \(N_1 + N_2 = n_1 + n_2\), the conditional distribution \(N_2 | N_1 + N_2\) follows a binomial distribution, \(\text{Binomial}(n_1 + n_2, p)\), where \(p\) is the ratio \(\frac {\int_{t_1}^{t_2} g(t) \, dt}{\int_{-\infty}^{t_2} g(t) \, dt}\). This distribution is free of \(\lambda\).
03

Establish prediction intervals

We need to solve the equations \(\alpha = \Pr(N_2 \leq n_-)\) and \(\alpha = \Pr(N_2 \geq n_+)\) using the binomial distribution established. The values \(n_-\) and \(n_+\) are determined by finding the appropriate percentiles of the binomial distribution defined by the integration of \(g(t)\).
04

Normal approximation and quadratic equation

For moderate to large \(n_1\), a normal approximation to the binomial distribution is applicable. We express the distribution as a quadratic equation in terms of \(n\), with coefficients involving \(p\), \(n_1\), and \(z_{\alpha}\) where \(\Phi(z_{\alpha}) = \alpha\). Solving this gives the approximate prediction interval for \(N_2\).
05

Evaluate special case parameters

For \(g(t) = 2^{t/2}\), calculate \(\int_{t_1}^{t_2} g(t)\, dt\) and \(\int_{-\infty}^{0} g(t)\, dt\) for given \(t_1, t_2\) to find \(p\). Given \(n_1 = 10\), solve the quadratic equation to find \(n_-\) and \(n_+\), yielding the predicted intervals.
06

Generalized Linear Model derivation

Identify the conditions for a generalized linear model density related to Binomial outcomes using the given expressions for \(b(\theta)\). Setting up this model allows testing certain hypotheses associated with small sample contingency tables.
07

Score test for generalized linear model

The score test for \(\Delta=1\) is evaluated using \(\sum R_{1j} \), which is treated as approximately normal with a mean and variance calculated as shown. This forms the basis for a statistical test equivalent, under certain conditions, to the Mantel-Haenszel test.

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

Key Concepts

These are the key concepts you need to understand to accurately answer the question.

Poisson Process
Epidemiological modeling often makes use of the Poisson Process, which is a powerful statistical tool. It's used to model random events occurring over a period of time or in a specified area where each event happens independently of the last. In the context of disease modeling, this could refer to the arrival of new cases of a disease in a population.
In the given problem, we're looking at a non-homogeneous Poisson process. This means that the rate at which the events occur, symbolized as \(\lambda g(t)\), can change over time, reflecting how an epidemic might speed up or slow down. Here, \(g(t)\) is a known function depicting how the disease spreads over time.
  • For instance, if \(g(t) = 2^{t/2}\), it indicates that the number of cases doubles every two years.
Utilizing this model allows for a better understanding of disease statistics especially when data is sparse; however, the challenge lies in accurately calculating future cases with limited previous data.
Prediction Intervals
Prediction intervals are crucial in making reliable future projections in epidemiological studies. Unlike confidence intervals, which focus on estimating population parameters, prediction intervals are about estimating the uncertainty of a future individual observation.
Prediction intervals take into account variability both within already observed data and in future data yet to be obtained. In the textbook problem, a \((1-2\alpha)\) prediction interval refers to the range where future case numbers \(N_2\) are expected to fall, with certain probability, after considering previously observed cases \(N_1\).
  • A practical example could be estimating how many new cases of an infection will appear next year given data from past years.
These intervals are determined by solving certain probability equations, which rely on the statistical distribution derived from the Poisson model, showing the application's depth in public health decision-making.
Normal Approximation
For larger sample sizes, calculating prediction intervals using a binomial distribution becomes complex, so instead, we approximate using a normal distribution. This simplifies calculations immensely thanks to the Central Limit Theorem, which suggests that as the number of observations grows, the distribution of the sample mean becomes normal, regardless of the original distribution.
In the context of this epidemiological modeling problem, this normal approximation revolves around re-expressing a binomial distribution problem into a normal one for ease of calculation. It transforms the population's binomial event probabilities into a solvable normal distribution problem with mean \(n_1\) and variance derived from corresponding mathematical expressions.
  • This normal approximation ensures that even slightly complex models become manageable, aiding in faster and more accessible solutions in predicting future pandemic impacts.
It ultimately streamlines the prediction interval solution for \(N_2\) when \(n_1, n_{-},\) and \(n_{+}\) values are moderate to large.
Generalized Linear Model
Generalized Linear Models (GLMs) extend traditional linear regression models to accommodate different types of response variables. They are particularly powerful in handling binary outcomes, counts, and more, making them highly applicable in medical statistics and epidemiology.
In the given problem, the GLM density emerges in part (d) and explains the statistical behavior of the system under study. Specifically, it assesses conditional relationships like those found in disease occurrence across different locations or time frames.
  • By reformulating the problem in terms of GLM, we can handle outcomes that follow nonnormal distributions, like binomial or Poisson, common in epidemiological data.
The GLM further allows the formation of a score test, evaluating hypotheses concerning relationships within the data. The resulting score test could gauge the association between factors influencing the spread of an epidemic without being constrained by the need for normally distributed errors, offering more flexibility and accuracy in public health analysis.

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

Suppose that \(Y\) has a density with generalized linear model form $$ f(y ; \theta, \phi)=\exp \left\\{\frac{y \theta-b(\theta)}{a(\phi)}+c(y ; \phi)\right\\} $$ where \(\theta=\theta(\eta)\) and \(\eta=\beta^{\mathrm{T}} x\). (a) Show that the weight for iterative weighted least squares based on expected information is $$ w=b^{\prime \prime}(\theta)(d \theta / d \eta)^{2} / a(\phi) $$ and deduce that \(w^{-1}=V(\mu) a(\phi)\\{d g(\mu) / d \mu\\}^{2}\), where \(V(\mu)\) is the variance function, and that the adjusted dependent variable is \(\eta+(y-\mu) d g(\mu) / d \mu\). Note that initial values are not required for \(\beta\), since \(w\) and \(z\) can be determined in terms of \(\eta\) and \(\mu\); initial values can be found from \(y\) as \(\mu^{1}=y\) and \(\eta^{1}=g(y)\). (b) Give explicit formulae for the weight and adjusted dependent variable when \(R=m Y\) is binomial with denominator \(m\) and probability \(\pi=e^{\eta} /\left(1+e^{\eta}\right)\).

Data \(y_{1}, \ldots, y_{n}\) are assumed to follow a binary logistic model in which \(y_{j}\) takes value 1 with probability \(\pi_{j}=\exp \left(x_{j}^{\mathrm{T}} \beta\right) /\left\\{1+\exp \left(x_{j}^{\mathrm{T}} \beta\right)\right\\}\) and value 0 otherwise, for \(j=1, \ldots, n\). (a) Show that the deviance for a model with fitted probabilities \(\widehat{\pi}_{j}\) can be written as $$ D=-2\left\\{y^{\mathrm{T}} X \widehat{\beta}+\sum_{j=1}^{n} \log \left(1-\hat{\pi}_{j}\right)\right\\} $$ and that the likelihood equation is \(X^{\mathrm{T}} y=X^{\mathrm{T}} \widehat{\pi}\). Hence show that the deviance is a function of the \(\widehat{\pi}_{j}\) alone. (b) If \(\pi_{1}=\cdots=\pi_{n}=\pi\), then show that \(\widehat{\pi}=\bar{y}\), and verify that $$ D=-2 n\\{\bar{y} \log \bar{y}+(1-\bar{y}) \log (1-\bar{y})\\} $$ Comment on the implications for using \(D\) to measure the discrepancy between the data and fitted model. (c) In (b), show that Pearson's statistic (10.21) is identically equal to \(n\). Comment.

Let \(Y_{1}, \ldots, Y_{n}\) be independent exponential variables with hazards \(\lambda_{j}=\exp \left(\beta^{\mathrm{T}} x_{j}\right)\). (a) Show that the expected information for \(\beta\) is \(X^{\mathrm{T}} X\), in the usual notation. (b) Now suppose that \(Y_{j}\) is subject to uninformative right censoring at time \(c_{j}\), so that \(y_{j}\) is a censoring time or a failure time as the case may be. Show that the log likelihood is $$ \ell_{U}(\beta)=\sum_{f} \beta^{\mathrm{T}} x_{j}-\sum_{j=1}^{n} \exp \left(\beta^{\mathrm{T}} x_{j}\right) y_{j} $$ where \(\sum_{f}\) denotes a sum over observations seen to fail. If the \(j\) th censoring-time is exponentially distributed with rate \(\kappa_{j}\), show that the expected information for \(\beta\) is \(X^{\mathrm{T}} X-\) \(X^{\mathrm{T}} C X\), where \(C=\operatorname{diag}\left\\{c_{1}, \ldots, c_{n}\right\\}\), and \(c_{j}=\kappa_{j} /\left(\kappa_{j}+\lambda_{j}\right)\) is the probability that the \(j\) th observation is censored. What is the implication for estimation of \(\beta\) if the \(c_{j}\) are constant? (c) Sometimes a variable \(W_{j}\) has been measured which can act as a surrogate response variable for censored individuals. We formulate this as \(W_{j}=Z_{j} / U_{j}\), where \(Z_{j}\) is the unobserved remaining life-time of the \(j\) th individual from the moment of censoring, and \(U_{j}\) is a noise component which has a fixed distribution independent of the censoring time and of \(x_{j}\). Owing to the exponential assumption, the excess life \(Z_{j}\) is independent of \(Y_{j}\) if censoring occurred. If \(U_{j}\) has gamma density $$ \alpha^{K} u^{\kappa-1} \exp (-\alpha u) / \Gamma(\kappa), \quad \alpha, \kappa>0, u>0 $$ show that \(W_{j}\) has density $$ \lambda_{j} \kappa \alpha^{\kappa} /\left(\alpha+\lambda_{j} w\right)^{\kappa+1}, \quad w>0 $$ Show that the log likelihood for the data, including the additional information in the \(W_{j}\), is $$ \ell(\beta)=L_{U}(\beta)+\sum_{c}\left\\{\beta^{\mathrm{T}} x_{j}+\log \kappa+\kappa \log \alpha-(\kappa+1) \log \left(\alpha+e^{\beta^{\mathrm{T}}} x_{j} w_{j}\right)\right\\} $$ where \(\sum_{c}\) denotes a sum over censored individuals, and we have assumed that \(\alpha\) and \(\kappa\) are known. Show that the expected information for \(\beta\) is $$ X^{\mathrm{T}} X-2 /(\kappa+2) X^{\mathrm{T}} C X $$ and compare this with (b). Explain qualitatively in terms of the variability of the distribution of \(U\) why the loss of information decreases as \(\kappa\) increases. \((\operatorname{Cox}, 1983)\)

A positive stable random variable \(U\) has \(\mathrm{E}\left(e^{-s U}\right)=\exp \left(-\delta s^{\alpha} / \alpha\right), 0<\alpha \leq 1\) (a) Show that if \(Y\) follows a proportional hazards model with cumulative hazard function \(u \exp \left(x^{\mathrm{T}} \beta\right) H_{0}(y)\), conditional on \(U=u\), then \(Y\) also follows a proportional hazards model unconditionally. Are \(\beta, \alpha\), and \(\delta\) estimable from data with single individuals only? (b) Consider a shared frailty model, as in the previous question, with positive stable \(U\). Show that the joint survivor function may be written as $$ \mathcal{F}\left(y_{1}, y_{2}\right)=\exp \left(-\left[\left\\{-\log \mathcal{F}_{1}\left(y_{1}\right)\right\\}^{1 / \alpha}+\left\\{-\log \mathcal{F}_{2}\left(y_{2}\right)\right\\}^{1 / \alpha}\right]^{\alpha}\right), \quad y_{1}, y_{2}>0 $$ in terms of the marginal survivor functions \(\mathcal{F}_{1}\) and \(\mathcal{F}_{2}\). Show that if the conditional cumulative hazard functions are Weibull, \(u H_{r}(y)=u \xi_{r} y^{\gamma}, \gamma>0, r=1,2\), then the marginal survivor functions are also Weibull. Show also that the time to the first event has a Weibull distribution.

Two individuals with cumulative hazard functions \(u H_{1}\left(y_{1}\right)\) and \(u H_{2}\left(y_{2}\right)\) are independent conditional on the value \(u\) of a frailty \(U\) whose density is \(f(u)\) (a) For this shared frailty model, show that $$ \mathcal{F}\left(y_{1}, y_{2}\right)=\operatorname{Pr}\left(Y_{1}>y_{1}, Y_{2}>y_{2}\right)=\int_{0}^{\infty} \exp \left\\{-u H_{1}\left(y_{1}\right)-u H_{2}\left(y_{2}\right)\right\\} f(u) d u $$ If \(f(u)=\lambda^{\alpha} u^{\alpha-1} \exp (-\lambda u) / \Gamma(\alpha)\), for \(u>0\) is a gamma density, then show that $$ \mathcal{F}\left(y_{1}, y_{2}\right)=\frac{\lambda^{\alpha}}{\left\\{\lambda+H_{1}\left(y_{1}\right)+H_{2}\left(y_{2}\right)\right\\}^{\alpha}}, \quad y_{1}, y_{2}>0 $$ and deduce that in terms of the marginal survivor functions \(\mathcal{F}_{1}\left(y_{1}\right)\) and \(\mathcal{F}_{2}\left(y_{2}\right)\) of \(Y_{1}\) and \(Y_{2}\) $$ \mathcal{F}\left(y_{1}, y_{2}\right)=\left\\{\mathcal{F}_{1}\left(y_{1}\right)^{-1 / \alpha}+\mathcal{F}_{2}\left(y_{2}\right)^{-1 / \alpha}-1\right\\}^{-\alpha}, \quad y_{1}, y_{2}>0 $$ What happens to this joint survivor function as \(\alpha \rightarrow \infty\) ? (b) Find the likelihood contributions when both individuals are observed to fail, when one is censored, and when both are censored. (c) Extend this to \(k\) individuals with parametric regression models for survival.

See all solutions

Recommended explanations on Math Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free