(a) Show that when data \((X, Y)\) are available, but with values of \(Y\) missing
at random, the log likelihood contribution can be written
$$
\ell(\theta) \equiv I \log f(Y \mid X ; \theta)+\log f(X ; \theta)
$$
and deduce that the expected information for \(\theta\) depends on the
missingness mechanism but that the observed information does not.
(b) Consider binary pairs \((X, Y)\) with indicator \(I\) equal to zero when \(Y\)
is missing; \(X\) is always seen. Their joint distribution is given by
$$
\operatorname{Pr}(Y=1 \mid X=0)=\theta_{0}, \quad \operatorname{Pr}(Y=1 \mid
X=1)=\theta_{1}, \quad \operatorname{Pr}(X=1)=\lambda
$$
while the missingness mechanism is
$$
\operatorname{Pr}(I=1 \mid X=0)=\eta_{0}, \quad \operatorname{Pr}(I=1 \mid
X=1)=\eta_{1}
$$
(i) Show that the likelihood contribution from \((X, Y, I)\) is
$$
\begin{aligned}
&{\left[\left\\{\theta_{1}^{Y}\left(1-\theta_{1}\right)^{1-Y}\right\\}^{X}\left\\{\theta_{0}^{Y}\left(1-\theta_{0}\right)^{1-Y}\right\\}^{1-X}\right]^{I}}
\\\
&\quad
\times\left\\{\eta_{0}^{I}\left(1-\eta_{0}\right)^{1-I}\right\\}^{1-X}\left\\{\eta_{1}^{I}\left(1-\eta_{1}\right)^{1-1}\right\\}^{X}
\times \lambda^{X}(1-\lambda)^{1-X}
\end{aligned}
$$
Deduce that the observed information for \(\theta_{1}\) based on a random sample
of size \(n\) is
$$
-\frac{\partial^{2} \ell\left(\theta_{0}, \theta_{1}\right)}{\partial \theta_{1}^{2}}=\sum_{j=1}^{n} I_{j} X_{j}\left\\{\frac{Y_{j}}{\theta_{1}^{2}}+\frac{1-Y_{j}}{\left(1-\theta_{1}\right)^{2}}\right\\}
$$
Give corresponding expressions for \(\partial^{2} \ell\left(\theta_{0},
\theta_{1}\right) / \partial \theta_{0}^{2}\) and \(\partial^{2}
\ell\left(\theta_{0}, \theta_{1}\right) / \partial \theta_{0} \partial
\theta_{1}\).
(ii) Statistician A calculates the expected information treating \(I_{1},
\ldots, I_{n}\) as fixed and thereby ignores the missing data mechanism. Show
that he gets \(i_{A}\left(\theta_{1}, \theta_{1}\right)=M \lambda
/\left\\{\theta_{1}(1-)\right.\) \(\left.\left.\theta_{1}\right)\right\\}\),
where \(M=\sum I_{j}\), and find the corresponding quantities
\(i_{A}\left(\theta_{0}, \theta_{1}\right)\) and \(i_{A}\left(\theta_{0},
\theta_{0}\right)\). If he uses this procedure for many sets of data, deduce
that on average \(M\) is replaced by \(n \operatorname{Pr}(I=1)=n\left\\{\lambda
\eta_{1}+(1-\lambda) \eta_{0}\right\\}\)
(iii) Statistician B calculates the expected information taking into account
the missingness mechanism. Show that she gets \(i_{B}\left(\theta_{1},
\theta_{1}\right)=n \lambda \eta_{1}
/\left\\{\theta_{1}\left(1-\theta_{1}\right)\right\\}\), and obtain
\(i_{B}\left(\theta_{0}, \theta_{1}\right)\) and \(i_{B}\left(\theta_{0},
\theta_{0}\right) .\)
(iv) Show that \(\mathrm{A}\) and \(\mathrm{B}\) get the same expected information
matrices only if \(Y\) is missing completely at random. Does this accord with
the discussion above?
(c) Statistician C argues that expected information should never be used in
data analysis: even if the data actually observed are complete, unless it can
be guaranteed that data could not be missing at random for any reason, every
expected information calculation should involve every potential missingness
mechanism. Such a guarantee is impossible in practice, so no expected
information calculation is ever correct. Do you agree?
(Kenward and Molenberghs, 1998\()\)