Chapter 10: Problem 6

The rate of growth of an epidemic such as AIDS for a large population can be estimated fairly accurately and treated as a known function $g(t)$ of time $t$. In a smaller area where few cases have been observed the rate is hard to estimate because data are scarce. However predictions of the numbers of future cases in such an area must be made in order to allocate resources such as hospital beds. A simple assumption is that cases in the area arise in a non- homogeneous Poisson process with rate $\lambda g(t)$, for which the mean number of cases in period $\left(t_{1}, t_{2}\right)$ is $\lambda \int_{t_{1}}^{t_{2}} g(t) d t$. Suppose that $N_{1}=n_{1}$ individuals with the disease have been observed in the period $(-\infty, 0)$, and that predictions are required for the number $N_{2}$, of cases to be observed in a future period $\left(t_{1}, t_{2}\right)$. (a) Find the conditional distribution of $N_{2}$ given $N_{1}+N_{2}$, and show it to be free of $\lambda$. Deduce that a $(1-2 \alpha)$ prediction interval $\left(n_{-}, n_{+}\right)$for $N_{2}$ is found by solving approximately the equations $$ \begin{aligned} &\alpha=\operatorname{Pr}\left(N_{2} \leq n_{-} \mid N_{1}+N_{2}=n_{1}+n_{-}\right) \\ &\alpha=\operatorname{Pr}\left(N_{2} \geq n_{+} \mid N_{1}+N_{2}=n_{1}+n_{+}\right) \end{aligned} $$ (b) Use a normal approximation to the conditional distribution in (a) to show that for moderate to large $n_{1}, n_{-}$and $n_{+}$are the solutions to the quadratic equation $$ (1-p)^{2} n^{2}+p(p-1)\left(2 n_{1}+z_{\alpha}^{2}\right) n+n_{1} p\left\\{n_{1} p-(1-p) z_{\alpha}^{2}\right\\}=0 $$ where $\Phi\left(z_{\alpha}\right)=\alpha$ and $$ p=\int_{t_{1}}^{t_{2}} g(t) d t /\left\\{\int_{t_{1}}^{t_{2}} g(t) d t+\int_{-\infty}^{0} g(t) d t\right\\} $$ (c) Find approximate $0.90$ prediction intervals for the special case where $g(t)=2^{t / 2}$, so that the doubling time for the epidemic is two years, $n_{1}=10$ cases have been observed until time 0 , and $t_{1}=0, t_{2}=1$ (next year) (Cox and Davison, 1989). (d) Show that conditional on $A, R_{1}$ has a generalized linear model density with $$ b(\theta)=\log \left\\{\sum_{u=u-}^{u_{+}}\left(\begin{array}{c} m_{1} \\ u \end{array}\right)\left(\begin{array}{c} m_{0} \\ a-u \end{array}\right) e^{u \theta}\right\\}, u_{-}=\max \left\\{0, a-m_{0}\right\\}, u_{+}=\min \left\\{m_{1}, a\right\\} $$ Deduce that a score test of $\Delta=1$ based on data from $n$ independent $2 \times 2$ tables $\left(R_{0 j}, m_{0 j}-R_{0 j} ; R_{1 j}, m_{1 j}-R_{1 j}\right)$ is obtained by treating $\sum R_{1 j}$ as approximately normal with mean and variance $$ \sum_{j=1}^{n} \frac{m_{1 j} a_{j}}{m_{0 j}+m_{1 j}}, \quad \sum_{j=1}^{n} \frac{m_{0 j} m_{1 j} a_{j}\left(m_{0 j}+m_{0 j}-a_{j}\right)}{\left(m_{0 j}+m_{1 j}\right)^{2}\left(m_{0 j}+m_{1 j}-1\right)} $$ when continuity-corrected this is the Mantel-Haenszel test. (Mantel and Haenszel, 1959 )

Short Answer

Expert verified

For part (a), use a binomial distribution for $N_2 | N_1 + N_2$ free of $\lambda$. For part (b), solve the quadratic with normal approximation for prediction intervals. In part (c), simplify integrals of $g(t) = 2^{t/2}$ to determine $p$ and find intervals.

Step by step solution

Understand the parameters and assumptions

The problem involves a non-homogeneous Poisson process with rate $\lambda g(t)$. The mean number of cases in the period $(t_1, t_2)$ is given by $\lambda \int_{t_1}^{t_2} g(t) \, dt$. We assume we have observed $n_1$ cases until time 0, and aim to predict the number of future cases $N_2$ in $(t_1, t_2)$.

Find the conditional distribution of $N_2$ given $N_1 + N_2$

To determine the conditional distribution, note that $N_1$ and $N_2$ follow Poisson distributions with parameters $\lambda \int_{-\infty}^{0} g(t)\, dt$ and $\lambda \int_{t_1}^{t_2} g(t)\, dt$ respectively. Given $N_1 + N_2 = n_1 + n_2$, the conditional distribution $N_2 | N_1 + N_2$ follows a binomial distribution, $\text{Binomial}(n_1 + n_2, p)$, where $p$ is the ratio $\frac {\int_{t_1}^{t_2} g(t) \, dt}{\int_{-\infty}^{t_2} g(t) \, dt}$. This distribution is free of $\lambda$.

Establish prediction intervals

We need to solve the equations $\alpha = \Pr(N_2 \leq n_-)$ and $\alpha = \Pr(N_2 \geq n_+)$ using the binomial distribution established. The values $n_-$ and $n_+$ are determined by finding the appropriate percentiles of the binomial distribution defined by the integration of $g(t)$.

Normal approximation and quadratic equation

For moderate to large $n_1$, a normal approximation to the binomial distribution is applicable. We express the distribution as a quadratic equation in terms of $n$, with coefficients involving $p$, $n_1$, and $z_{\alpha}$ where $\Phi(z_{\alpha}) = \alpha$. Solving this gives the approximate prediction interval for $N_2$.

Evaluate special case parameters

For $g(t) = 2^{t/2}$, calculate $\int_{t_1}^{t_2} g(t)\, dt$ and $\int_{-\infty}^{0} g(t)\, dt$ for given $t_1, t_2$ to find $p$. Given $n_1 = 10$, solve the quadratic equation to find $n_-$ and $n_+$, yielding the predicted intervals.

Generalized Linear Model derivation

Identify the conditions for a generalized linear model density related to Binomial outcomes using the given expressions for $b(\theta)$. Setting up this model allows testing certain hypotheses associated with small sample contingency tables.

Score test for generalized linear model

The score test for $\Delta=1$ is evaluated using $\sum R_{1j} $, which is treated as approximately normal with a mean and variance calculated as shown. This forms the basis for a statistical test equivalent, under certain conditions, to the Mantel-Haenszel test.

Unlock Step-by-Step Solutions & Ace Your Exams!

Full Textbook Solutions
Get detailed explanations and key concepts
Unlimited Al creation
Al flashcards, explanations, exams and more...
Ads-free access
To over 500 millions flashcards
Money-back guarantee
We refund you if you fail your exam.

Start your free trial

Over 30 million students worldwide already upgrade their learning with Vaia!

Key Concepts

These are the key concepts you need to understand to accurately answer the question.

Poisson Process

Epidemiological modeling often makes use of the Poisson Process, which is a powerful statistical tool. It's used to model random events occurring over a period of time or in a specified area where each event happens independently of the last. In the context of disease modeling, this could refer to the arrival of new cases of a disease in a population.
In the given problem, we're looking at a non-homogeneous Poisson process. This means that the rate at which the events occur, symbolized as $\lambda g(t)$, can change over time, reflecting how an epidemic might speed up or slow down. Here, $g(t)$ is a known function depicting how the disease spreads over time.

For instance, if $g(t) = 2^{t/2}$, it indicates that the number of cases doubles every two years.

Utilizing this model allows for a better understanding of disease statistics especially when data is sparse; however, the challenge lies in accurately calculating future cases with limited previous data.

Prediction Intervals

Prediction intervals are crucial in making reliable future projections in epidemiological studies. Unlike confidence intervals, which focus on estimating population parameters, prediction intervals are about estimating the uncertainty of a future individual observation.
Prediction intervals take into account variability both within already observed data and in future data yet to be obtained. In the textbook problem, a $(1-2\alpha)$ prediction interval refers to the range where future case numbers $N_2$ are expected to fall, with certain probability, after considering previously observed cases $N_1$.

A practical example could be estimating how many new cases of an infection will appear next year given data from past years.

These intervals are determined by solving certain probability equations, which rely on the statistical distribution derived from the Poisson model, showing the application's depth in public health decision-making.

Normal Approximation

For larger sample sizes, calculating prediction intervals using a binomial distribution becomes complex, so instead, we approximate using a normal distribution. This simplifies calculations immensely thanks to the Central Limit Theorem, which suggests that as the number of observations grows, the distribution of the sample mean becomes normal, regardless of the original distribution.
In the context of this epidemiological modeling problem, this normal approximation revolves around re-expressing a binomial distribution problem into a normal one for ease of calculation. It transforms the population's binomial event probabilities into a solvable normal distribution problem with mean $n_1$ and variance derived from corresponding mathematical expressions.

This normal approximation ensures that even slightly complex models become manageable, aiding in faster and more accessible solutions in predicting future pandemic impacts.

It ultimately streamlines the prediction interval solution for $N_2$ when $n_1, n_{-},$ and $n_{+}$ values are moderate to large.

Generalized Linear Model

Generalized Linear Models (GLMs) extend traditional linear regression models to accommodate different types of response variables. They are particularly powerful in handling binary outcomes, counts, and more, making them highly applicable in medical statistics and epidemiology.
In the given problem, the GLM density emerges in part (d) and explains the statistical behavior of the system under study. Specifically, it assesses conditional relationships like those found in disease occurrence across different locations or time frames.

By reformulating the problem in terms of GLM, we can handle outcomes that follow nonnormal distributions, like binomial or Poisson, common in epidemiological data.

The GLM further allows the formation of a score test, evaluating hypotheses concerning relationships within the data. The resulting score test could gauge the association between factors influencing the spread of an epidemic without being constrained by the need for normally distributed errors, offering more flexibility and accuracy in public health analysis.

Short Answer

Step by step solution

Understand the parameters and assumptions

Find the conditional distribution of \(N_2\) given \(N_1 + N_2\)

Establish prediction intervals

Normal approximation and quadratic equation

Evaluate special case parameters

Generalized Linear Model derivation

Score test for generalized linear model

Key Concepts

Poisson Process

Prediction Intervals

Normal Approximation

Generalized Linear Model

One App. One Place for Learning.

Most popular questions from this chapter

Recommended explanations on Math Textbooks

Geometry

Pure Maths

Theoretical and Mathematical Physics

Decision Maths

Probability and Statistics

Logic and Functions

Study anywhere. Anytime. Across all devices.

Company

Product

Help