Chapter 5: Problem 16
(a) Show that when data \((X, Y)\) are available, but with values of \(Y\) missing at random, the log likelihood contribution can be written $$ \ell(\theta) \equiv I \log f(Y \mid X ; \theta)+\log f(X ; \theta) $$ and deduce that the expected information for \(\theta\) depends on the missingness mechanism but that the observed information does not. (b) Consider binary pairs \((X, Y)\) with indicator \(I\) equal to zero when \(Y\) is missing; \(X\) is always seen. Their joint distribution is given by $$ \operatorname{Pr}(Y=1 \mid X=0)=\theta_{0}, \quad \operatorname{Pr}(Y=1 \mid X=1)=\theta_{1}, \quad \operatorname{Pr}(X=1)=\lambda $$ while the missingness mechanism is $$ \operatorname{Pr}(I=1 \mid X=0)=\eta_{0}, \quad \operatorname{Pr}(I=1 \mid X=1)=\eta_{1} $$ (i) Show that the likelihood contribution from \((X, Y, I)\) is $$ \begin{aligned} &{\left[\left\\{\theta_{1}^{Y}\left(1-\theta_{1}\right)^{1-Y}\right\\}^{X}\left\\{\theta_{0}^{Y}\left(1-\theta_{0}\right)^{1-Y}\right\\}^{1-X}\right]^{I}} \\\ &\quad \times\left\\{\eta_{0}^{I}\left(1-\eta_{0}\right)^{1-I}\right\\}^{1-X}\left\\{\eta_{1}^{I}\left(1-\eta_{1}\right)^{1-1}\right\\}^{X} \times \lambda^{X}(1-\lambda)^{1-X} \end{aligned} $$ Deduce that the observed information for \(\theta_{1}\) based on a random sample of size \(n\) is $$ -\frac{\partial^{2} \ell\left(\theta_{0}, \theta_{1}\right)}{\partial \theta_{1}^{2}}=\sum_{j=1}^{n} I_{j} X_{j}\left\\{\frac{Y_{j}}{\theta_{1}^{2}}+\frac{1-Y_{j}}{\left(1-\theta_{1}\right)^{2}}\right\\} $$ Give corresponding expressions for \(\partial^{2} \ell\left(\theta_{0}, \theta_{1}\right) / \partial \theta_{0}^{2}\) and \(\partial^{2} \ell\left(\theta_{0}, \theta_{1}\right) / \partial \theta_{0} \partial \theta_{1}\). (ii) Statistician A calculates the expected information treating \(I_{1}, \ldots, I_{n}\) as fixed and thereby ignores the missing data mechanism. Show that he gets \(i_{A}\left(\theta_{1}, \theta_{1}\right)=M \lambda /\left\\{\theta_{1}(1-)\right.\) \(\left.\left.\theta_{1}\right)\right\\}\), where \(M=\sum I_{j}\), and find the corresponding quantities \(i_{A}\left(\theta_{0}, \theta_{1}\right)\) and \(i_{A}\left(\theta_{0}, \theta_{0}\right)\). If he uses this procedure for many sets of data, deduce that on average \(M\) is replaced by \(n \operatorname{Pr}(I=1)=n\left\\{\lambda \eta_{1}+(1-\lambda) \eta_{0}\right\\}\) (iii) Statistician B calculates the expected information taking into account the missingness mechanism. Show that she gets \(i_{B}\left(\theta_{1}, \theta_{1}\right)=n \lambda \eta_{1} /\left\\{\theta_{1}\left(1-\theta_{1}\right)\right\\}\), and obtain \(i_{B}\left(\theta_{0}, \theta_{1}\right)\) and \(i_{B}\left(\theta_{0}, \theta_{0}\right) .\) (iv) Show that \(\mathrm{A}\) and \(\mathrm{B}\) get the same expected information matrices only if \(Y\) is missing completely at random. Does this accord with the discussion above? (c) Statistician C argues that expected information should never be used in data analysis: even if the data actually observed are complete, unless it can be guaranteed that data could not be missing at random for any reason, every expected information calculation should involve every potential missingness mechanism. Such a guarantee is impossible in practice, so no expected information calculation is ever correct. Do you agree? (Kenward and Molenberghs, 1998\()\)
Short Answer
Step by step solution
Key Concepts
These are the key concepts you need to understand to accurately answer the question.