Chapter 5: Problem 16

(a) Show that when data $(X, Y)$ are available, but with values of $Y$ missing at random, the log likelihood contribution can be written $$ \ell(\theta) \equiv I \log f(Y \mid X ; \theta)+\log f(X ; \theta) $$ and deduce that the expected information for $\theta$ depends on the missingness mechanism but that the observed information does not. (b) Consider binary pairs $(X, Y)$ with indicator $I$ equal to zero when $Y$ is missing; $X$ is always seen. Their joint distribution is given by $$ \operatorname{Pr}(Y=1 \mid X=0)=\theta_{0}, \quad \operatorname{Pr}(Y=1 \mid X=1)=\theta_{1}, \quad \operatorname{Pr}(X=1)=\lambda $$ while the missingness mechanism is $$ \operatorname{Pr}(I=1 \mid X=0)=\eta_{0}, \quad \operatorname{Pr}(I=1 \mid X=1)=\eta_{1} $$ (i) Show that the likelihood contribution from $(X, Y, I)$ is $$ \begin{aligned} &{\left[\left\\{\theta_{1}^{Y}\left(1-\theta_{1}\right)^{1-Y}\right\\}^{X}\left\\{\theta_{0}^{Y}\left(1-\theta_{0}\right)^{1-Y}\right\\}^{1-X}\right]^{I}} \\\ &\quad \times\left\\{\eta_{0}^{I}\left(1-\eta_{0}\right)^{1-I}\right\\}^{1-X}\left\\{\eta_{1}^{I}\left(1-\eta_{1}\right)^{1-1}\right\\}^{X} \times \lambda^{X}(1-\lambda)^{1-X} \end{aligned} $$ Deduce that the observed information for $\theta_{1}$ based on a random sample of size $n$ is $$ -\frac{\partial^{2} \ell\left(\theta_{0}, \theta_{1}\right)}{\partial \theta_{1}^{2}}=\sum_{j=1}^{n} I_{j} X_{j}\left\\{\frac{Y_{j}}{\theta_{1}^{2}}+\frac{1-Y_{j}}{\left(1-\theta_{1}\right)^{2}}\right\\} $$ Give corresponding expressions for $\partial^{2} \ell\left(\theta_{0}, \theta_{1}\right) / \partial \theta_{0}^{2}$ and $\partial^{2} \ell\left(\theta_{0}, \theta_{1}\right) / \partial \theta_{0} \partial \theta_{1}$. (ii) Statistician A calculates the expected information treating $I_{1}, \ldots, I_{n}$ as fixed and thereby ignores the missing data mechanism. Show that he gets $i_{A}\left(\theta_{1}, \theta_{1}\right)=M \lambda /\left\\{\theta_{1}(1-)\right.$ $\left.\left.\theta_{1}\right)\right\\}$, where $M=\sum I_{j}$, and find the corresponding quantities $i_{A}\left(\theta_{0}, \theta_{1}\right)$ and $i_{A}\left(\theta_{0}, \theta_{0}\right)$. If he uses this procedure for many sets of data, deduce that on average $M$ is replaced by $n \operatorname{Pr}(I=1)=n\left\\{\lambda \eta_{1}+(1-\lambda) \eta_{0}\right\\}$ (iii) Statistician B calculates the expected information taking into account the missingness mechanism. Show that she gets $i_{B}\left(\theta_{1}, \theta_{1}\right)=n \lambda \eta_{1} /\left\\{\theta_{1}\left(1-\theta_{1}\right)\right\\}$, and obtain $i_{B}\left(\theta_{0}, \theta_{1}\right)$ and $i_{B}\left(\theta_{0}, \theta_{0}\right) .$ (iv) Show that $\mathrm{A}$ and $\mathrm{B}$ get the same expected information matrices only if $Y$ is missing completely at random. Does this accord with the discussion above? (c) Statistician C argues that expected information should never be used in data analysis: even if the data actually observed are complete, unless it can be guaranteed that data could not be missing at random for any reason, every expected information calculation should involve every potential missingness mechanism. Such a guarantee is impossible in practice, so no expected information calculation is ever correct. Do you agree? (Kenward and Molenberghs, 1998$)$

Short Answer

Expert verified

Observed and expected information differ due to missing data assumptions. Complete data absence awareness supports using expected information with caution.

Step by step solution

Understand the Likelihood Contribution

Given the log likelihood with missing data mechanism, notice that if $I$ (indicator for missing data) is zero, the first term involving $Y$ is excluded. Analyze the expression $\ell(\theta) = I \log f(Y \mid X ; \theta) + \log f(X ; \theta)$ which separates likelihood contributions of observed and unobserved data.

Evaluate Expected and Observed Information

The key observation is that the expected information for $\theta$ will consider the missing data mechanism due to expectation over this random component, whereas the observed information does not depend on this mechanism under the MAR (missing at random) assumption.

Formulate Likelihood for (X, Y, I)

Use the joint distribution and the missingness mechanism. Compute the likelihood contribution for observed pairs $(X, Y, I)$ as given in the problem statement, considering separate terms for each possible case of $X$, $Y$, and $I$. Substitute the given probabilities to derive the likelihood formula.

Derive Observed Information

Differentiate the log likelihood with respect to $\theta_1$ twice. Start by treating $\ell(\theta_0, \theta_1)$ as a log of compound expressions and apply derivative rules to capture the contribution depending on $I_j X_j$. Get corresponding expressions for each relevant derivative concerning $\theta_0$ and $\theta_1$.

Calculate Expected Information (Statistician A)

Assume $I_j$ are fixed, leading to treatment of missing data as complete. Derive expected information for $\theta_1, \theta_0$, combining knowledge of info contributions with sample sums corresponding to actual data indicators $M = \sum I_j$. Use calculations assuming fixed data presence.

Calculate Expected Information (Statistician B)

Adjust expected information calculations to consider average data influence due to the missingness mechanism. Combine sample size $n$, the probability of $X$ values, and missingness contributions $\eta_0, \eta_1$. Relate this to the contributions of each $\theta$.

Compare Statistician A and B Results

Verify the equivalency between A and B's expected information only when data is missing completely at random. This results in identical matrices since both calculations effectively ignore missingness influence.

Discuss Statistician C's Perspective

Consider the statement about the relevance of expected information. Under the assumption of potential non-random missingness, argue the validity and applicability of expected information analyses. Share your agreement or disagreement based on the necessity for practical analysis.

Unlock Step-by-Step Solutions & Ace Your Exams!

Full Textbook Solutions
Get detailed explanations and key concepts
Unlimited Al creation
Al flashcards, explanations, exams and more...
Ads-free access
To over 500 millions flashcards
Money-back guarantee
We refund you if you fail your exam.

Start your free trial

Over 30 million students worldwide already upgrade their learning with Vaia!

Key Concepts

These are the key concepts you need to understand to accurately answer the question.

Missing Data Mechanism

In statistical inference, understanding the missing data mechanism is crucial. It refers to the process that leads to data being incomplete. Specifically, this mechanism characterizes why certain data points are missing and influences how we account for them. There are different types of missing data mechanisms like Missing Completely at Random (MCAR), Missing at Random (MAR), and Not Missing at Random (NMAR).

For instance, in the given exercise, assuming a data pair $X, Y$ with indicator $I$ shows the presence or absence of $Y$, the missingness mechanism can alter how the likelihood is constructed. When data are MAR, the probability of data being missing depends only on the observed data. Therefore, the missingness mechanism directly affects the expected information but not the observed information. This distinction highlights the significance of understanding the missing data mechanism as it impacts both statistical analysis and the choice of methods to handle incomplete data.

Likelihood Functions

Likelihood functions are a cornerstone of statistical inference, providing a method to estimate parameters of a statistical model. The likelihood function represents the probability of observed data given specific model parameters. When we have missing data, especially in cases as presented in the problem, the likelihood function needs to be modified.

In our example, the contribution to the likelihood depends on both observed and missing data. The formula $\ell(\theta) = I \log f(Y \mid X ; \theta) + \log f(X ; \theta)$ separates the contributions of the observed and missing data. The use of an indicator $I$ ensures that terms involving missing data are only included when relevant. Consequently, the likelihood function here allows us to integrate information from the missing data when it's possible, helping in parameter estimation.

Observed Information

Observed information pertains to the second derivative of the log-likelihood with respect to the parameter of interest. It measures how much data actually informs us about a parameter. Unlike expected information, observed information does not consider the missingness mechanism. In missing at random (MAR) settings, it strictly relies on available data.

In the provided solution, the expression \[-\frac{\partial^{2} \ell\left(\theta_{0}, \theta_{1}\right)}{\partial \theta_{1}^{2}}=\sum_{j=1}^{n} I_{j} X_{j}\left\{\frac{Y_{j}}{\theta_{1}^{2}}+\frac{1-Y_{j}}{\left(1-\theta_{1}\right)^{2}}\right\}\]gives us the observed information for $\theta_{1}$. This formula highlights how only the seen data $X$ and $Y$ (when $I=1$) contribute to the information measure, thereby limiting the analysis to what is directly observed.

Expected Information

Expected information, or Fisher information, calculates the variance of the score (first derivative) and incorporates the data's full potential insight, taking into account the missingness mechanism. In a statistical model, it helps in assessing the efficiency of estimators and is generally used in large sample inference.

In this scenario, we evaluate expected information differently depending on whether we consider the missingness mechanism. Statistician A, for example, treats the missing data mechanism as fixed, ignoring how probability influences the missing data, resulting in expected information calculated as if all data were present. Meanwhile, Statistician B acknowledges the missingness mechanism, offering a more comprehensive measure. This is demonstrated when Statistician B computes:\[i_{B}\left(\theta_{1}, \theta_{1}\right)=n \lambda \eta_{1} /\left\{\theta_{1}\left(1-\theta_{1}\right)\right\}\]indicating that incorporating the probability of missingness delivers a more accurate reflection of the true information conveyed by the data.

Short Answer

Step by step solution

Understand the Likelihood Contribution

Evaluate Expected and Observed Information

Formulate Likelihood for (X, Y, I)

Derive Observed Information

Calculate Expected Information (Statistician A)

Calculate Expected Information (Statistician B)

Compare Statistician A and B Results

Discuss Statistician C's Perspective

Key Concepts

Missing Data Mechanism

Likelihood Functions

Observed Information

Expected Information

One App. One Place for Learning.

Most popular questions from this chapter

Recommended explanations on Math Textbooks

Pure Maths

Geometry

Theoretical and Mathematical Physics

Probability and Statistics

Decision Maths

Mechanics Maths

Study anywhere. Anytime. Across all devices.

Company

Product

Help