6.2 Ignorable or nonignorable?

Recall from Section 2.2.6 that the assumption of ignorability is essentially the belief that the available data are sufficient to correct the missing data. There are two main strategies that we might pursue if the response mechanism is nonignorable:

Expand the data in the imputation model in the hope of making the missing data mechanism closer to MAR, or
Formulate and fit a nonignorable imputation model and perform sensitivity analysis on the critical parameters.

Collins, Schafer, and Kam (2001) remarked that it is a “safe bet” there will be lurking variables \(Z\) that are correlated both with the variables of interest \(Y\) and with the missingness of \(Y\). The important question is, however, whether these correlations are strong enough to produce substantial bias if no measures are taken. Collins, Schafer, and Kam (2001) performed simulations that provided some answers in the case of linear regression. If the missing data rate did not exceed 25% and if the correlation between the \(Z\) and \(Y\) was 0.4, omitting \(Z\) from the imputation model had a negligible effect. For more extreme situations, with 50% missing data and/or a correlation of 0.9, the effect depended strongly on the form of the missing data mechanism. When the probability to be missing was linear in \(Z\) (like MARRIGHT in Section 3.2.4), then omitting \(Z\) from the imputation model only affected the intercept, whereas the regression weights and variance estimates were unaffected. When more missing data were created in the extremes (like MARTAIL), the reverse occurred: omitting \(Z\) affected the regression coefficients and variance estimates, but the intercept was unbiased with the correct confidence interval. In summary, all estimates under multiple imputation were remarkably robust against MNAR in many instances. Beyond a correlation of 0.4 or a missing data rate over 25% the form of the missing data mechanism determines which parameters are affected.

Based on these results, we suggest the following guidelines. The MAR assumption is often a suitable starting point. If the MAR assumption is suspect for the data at hand, a next step is to find additional data that are strongly predictive of the missingness, and include these into the imputation model. If all possibilities for such data are exhausted and if the assumption is still suspect, perform a concise simulation study as in Collins, Schafer, and Kam (2001) customized for the problem at hand with the goal of finding out how extreme the MNAR mechanism needs to be to influence the parameters of scientific interest. Finally, use a nonignorable imputation model (cf. Section 3.8) to correct the direction of imputations created under MAR. Vary the most critical parameters, and study their influence on the final inferences. Section 9.2 contains an example of how this can be done in practice.