Chapter 8: Problem 18

Over a period of 90 days a study was carried out on 1500 women. Its purpose was to investigate the relation between obstetrical practices and the time spent in the delivery suite by women giving birth. One thing that greatly affects this time is whether or not a woman has previously given birth. Unfortunately this vital information was lost, giving the researchers three options: (a) abandon the study; (b) go back to the medical records and find which women had previously given birth (very time-consuming); or (c) for each day check how many women had previously given birth (relatively quick). The statistical question arising was whether (c) would recover enough information about the parameter of interest. Suppose that a linear model is appropriate for log time in delivery suite, and that the log time for a first delivery is normally distributed with mean $\mu+\alpha$ and variance $\sigma^{2}$, whereas for subsequent deliveries the mean time is $\mu$. Suppose that the times for all the women are independent, and that for each there is a probability $\pi$ that the labour is her first, independent of the others. Further suppose that the women are divided into $k$ groups corresponding to days and that each group has size $m$; the overall number is $n=m k$. Under (c), show that the average log time on day $j, Z_{j}$, is normally distributed with mean $\mu+R_{j} \alpha / m$ and variance $\sigma^{2} / m$, where $R_{j}$ is binomial with probability $\pi$ and denominator $m$. Hence show that the overall log likelihood is $$ \ell(\mu, \alpha)=-\frac{1}{2} k \log \left(2 \pi \sigma^{2} / m\right)-\frac{m}{2 \sigma^{2}} \sum_{j=1}^{k}\left(z_{j}-\mu-r_{j} \alpha / m\right)^{2} $$ where $z_{j}$ and $r_{j}$ are the observed values of $Z_{j}$ and $R_{j}$ and we take $\pi$ and $\sigma^{2}$ to be known. If $R_{j}$ has mean $m \pi$ and variance $m \tau^{2}$, show that the inverse expected information matrix is $$ I(\mu, \alpha)^{-1}=\frac{\sigma^{2}}{n \tau^{2}}\left(\begin{array}{cc} m \pi^{2}+\tau^{2} & -m \pi \\ -m \pi & m \end{array}\right) $$ (i) If $m=1, \tau^{2}=\pi(1-\pi)$, and $\pi=n_{1} / n$, where $n=n_{0}+n_{1}$, show that $I(\mu, \alpha)^{-1}$ equals the variance matrix for the two-sample regression model. Explain why. (ii) If $\tau^{2}=0$, show that neither $\mu$ nor $\alpha$ is estimable; explain why. (iii) If $\tau^{2}=\pi(1-\pi)$, show that $\mu$ is not estimable when $\pi=1$, and that $\alpha$ is not estimable when $\pi=0$ or $\pi=1$. Explain why the conditions for these two parameters to be estimable differ in form. (iv) Show that the effect of grouping, $(m>1)$, is that $\operatorname{var}(\widehat{\alpha})$ is increased by a factor $m$ regardless of $\pi$ and $\sigma^{2}$ (v) It was known that $\sigma^{2} \doteq 0.2, m \doteq 1500 / 90, \pi \doteq 0.3$. Calculate the standard error for $\widehat{\alpha}$. It was known from other studies that first deliveries are typically 20-25\% longer than subsequent ones. Show that an effect of size $\alpha=\log (1.25)$ would be very likely to be detected based on the grouped data, but that an effect of size $\alpha=\log (1.20)$ would be less certain to be detected, and discuss the implications.

Short Answer

Expert verified

The variance matrix for the two-sample regression model is equal to $I(\mu, \alpha)^{-1}$ when $m=1$. With $\tau^2 = 0$, parameters can't be estimated. $\mu$ isn't estimable when all deliveries are first ($\pi=1$), and $\alpha$ isn't estimable at $\pi=0$ or $1$ since no comparison occurs. Grouping increases $\operatorname{var}(\widehat{\alpha})$ by factor $m$. Here, a 25% change in delivery time, unlike 20%, stands out statistically due to stronger signal over expected error.

Step by step solution

Establish the distribution of average log time

For any given day, we observe the average log time spent in the delivery suite, denoted as $Z_j$. Since the log time follows a normal distribution and the mean differs based on whether it is a first delivery, if $R_j$ women on day $j$ are having their first delivery, the mean log time is $\mu + \frac{R_j \alpha}{m}$. The variance of the log time on that day becomes $\frac{\sigma^2}{m}$. Hence, the distribution of $Z_j$ becomes normally distributed with mean $\mu + \frac{R_j \alpha}{m}$ and variance $\frac{\sigma^2}{m}$.

Derive the log likelihood

The log likelihood function can be constructed from the normal distribution of $Z_j$. By summing over all days $j$, we derive: \[ \ell(\mu, \alpha) = -\frac{1}{2} k \log \left(2 \pi \sigma^{2} / m\right) - \frac{m}{2 \sigma^{2}} \sum_{j=1}^{k}(z_j - \mu - \frac{r_j \alpha}{m})^2. \]

Derive expected information matrix

Under the assumption that $R_j$ is a binomial with mean $m\pi$ and variance $m\tau^2$, we calculate the expected Fisher information matrix, from which the inverse is: \[ I(\mu, \alpha)^{-1}=\frac{\sigma^{2}}{n \tau^{2}}\begin{bmatrix} m \pi^{2}+\tau^{2} & -m \pi \ -m \pi & m \end{bmatrix}. \] This matrix defines the variance-covariance matrix of the parameter estimates.

Solve subpart (i)

Here, set $m=1$ and $\tau^2=\pi(1-\pi)$, with $\pi=n_1/n$. The variance matrix aligns with that of a two-sample regression model, where observations are independent and come from either of two distributions, showing that it's equivalent to having two independent groups where some have first deliveries and others subsequent ones.

Solve subpart (ii)

If $\tau^2=0$, then the information matrix becomes singular (non-invertible), meaning no distinct estimates for $\mu$ or $\alpha$ exist, due to no variance in the data for determining the parameters.

Solve subpart (iii)

If $\tau^2=\pi(1-\pi)$:- When $\pi=1$, all deliveries are first, making $\alpha$ irrelevant and $\mu$ non-estimable since there's no variation in delivery type.- When $\pi=0$ or $1$, $\alpha$ becomes non-estimable as comparisons with non-existent data (all first or all subsequent deliveries) are invalid. Estimability hinges on having diversity in the sample (first vs subsequent deliveries).

Solve subpart (iv)

For $m > 1$, the variance multiplication factor demonstrates that grouping increases the error of estimating $\alpha$, following the dynamics of grouping in sample data leading to less precise estimates.

Solve subpart (v)

Given $\sigma^2 = 0.2$, $m = 16.67$, $\pi = 0.3$, calculate: - Standard error for $\widehat{\alpha}$ involves its variance portion $m$, implying \[ SE(\widehat{\alpha}) = \sqrt{m\pi(1-\pi)\frac{\sigma^2}{n\tau^2}}. \] A 25% effect size $\alpha=\log(1.25)$ exceeds typical sampling variability, ensuring detectability, but a 20% shift $\alpha=\log(1.20)$ approaches this variability, thus less certain due to potential overlap in sampling fluctuations.

Unlock Step-by-Step Solutions & Ace Your Exams!

Full Textbook Solutions
Get detailed explanations and key concepts
Unlimited Al creation
Al flashcards, explanations, exams and more...
Ads-free access
To over 500 millions flashcards
Money-back guarantee
We refund you if you fail your exam.

Start your free trial

Over 30 million students worldwide already upgrade their learning with Vaia!

Key Concepts

These are the key concepts you need to understand to accurately answer the question.

Linear Regression

Linear Regression is a statistical model used to model the relationship between a dependent variable and one or more independent variables. In the context of the exercise, linear regression is employed to relate the log time spent by women in the delivery suite with predictors such as whether it is a first delivery or not.

For this particular problem, we assume the log time follows a normal distribution with different means based on the birth order (first time vs. subsequent deliveries). It simplifies the relationship by considering the log time as a linear function of a binary variable indicating the type of delivery.

Linear Model Assumption: It is assumed appropriate for modeling log time because it accounts for different mean times based on delivery order.
Normal Distribution: The log time is normally distributed, which supports assumptions behind linear regression for inference.

This setup helps in predicting the delivery time and understanding its variability based on certain important variables.

Likelihood Function

The Likelihood Function represents how likely it is to observe the collected data given a set of parameter values within a statistical model. For linear regression, the likelihood function is constructed from the normal distribution assumptions of the residuals.

With our exercise, the log likelihood function for model parameters $\mu$ and $\alpha$ given observed log times $z_j$ and first delivery counts $r_j$ is:

\[\ell(\mu, \alpha) = -\frac{1}{2} k \log \left(2 \pi \sigma^{2} / m\right) - \frac{m}{2 \sigma^{2}} \sum_{j=1}^{k}(z_j - \mu - \frac{r_j \alpha}{m})^2.\]

In this expression:

Log Likelihood Contribution: Each day's log likelihood is based on deviations of observed average log time from expected values.
Summation Across Days: Total likelihood aggregates information across all days to estimate parameters.

This formulation is crucial for statistical inference, as it allows estimation of parameters that explain the observed data.

Variance-Covariance Matrix

The Variance-Covariance Matrix in statistical models captures the variance of each parameter estimate and the covariance between pairs of parameters. For linear regression, this matrix can be derived from the Fisher information matrix.

In our scenario, the inverse expected information matrix accounts for group sizes and parameter variances, given by:

\[I(\mu, \alpha)^{-1}=\frac{\sigma^{2}}{n \tau^{2}}\begin{bmatrix} m \pi^{2}+\tau^{2} & -m \pi \ -m \pi & m \end{bmatrix}\]

Key aspects include:

Variance Components: Diagonal elements represent the variance of $\mu$ and $\alpha$, reflecting how they can vary independently.
Correlation Insights: Off-diagonal elements suggest dependencies between parameter estimates.

This matrix is essential for understanding the precision of parameter estimates and how they may be influenced by the study's design and assumptions.

Parameter Estimation

Parameter Estimation refers to the process of using sample data to estimate the parameters of a statistical model. In our linear regression model, parameters $\mu$ and $\alpha$ indicate the mean delivery time for non-first and first deliveries, respectively.

Estimating these parameters involves maximizing the likelihood function to obtain values that best fit the observed data. The estimation process considers:

Effect of Grouping: Group sizes $m$ affect the precision of $\alpha$ estimation, with increased error due to grouping, as shown by variance scaling factors.
Estimation Challenges: Special cases where $\pi=0$ or $1$ lead to non-estimability of parameters, highlighting the need for variability in observations.

Correct estimation is critical in making valid inferences from the study, ensuring that interpretations of variables are statistically grounded.

Binomial Distribution

Binomial Distribution is a discrete probability distribution that models the number of successes in a fixed number of independent trials. Here, it helps model the number of first deliveries on any given day.

For the study, each woman's delivery on a given day is treated as an independent Bernoulli trial with probability $\pi$ that the delivery is a first time. Across the group size of $m$, the number of first deliveries $R_j$ follows a binomial distribution:

Mean and Variance: The expected number of first deliveries is $m \pi$ with variance $m \tau^{2}$.
Model Fit: The use of the binomial model helps in fitting group-level data accurately, predicting distribution characteristics.

Understanding this distribution allows for better insight into how grouping and probabilities impact overall model predictions and parameter estimates.

Short Answer

Step by step solution

Establish the distribution of average log time

Derive the log likelihood

Derive expected information matrix

Solve subpart (i)

Solve subpart (ii)

Solve subpart (iii)

Solve subpart (iv)

Solve subpart (v)

Key Concepts

Linear Regression

Likelihood Function

Variance-Covariance Matrix

Parameter Estimation

Binomial Distribution

One App. One Place for Learning.

Most popular questions from this chapter

Recommended explanations on Math Textbooks

Theoretical and Mathematical Physics

Pure Maths

Applied Mathematics

Logic and Functions

Decision Maths

Calculus

Study anywhere. Anytime. Across all devices.

Company

Product

Help