Multiple imputation in data that grow over time: A comparison of three strategies


Multiple imputation is a highly recommended technique to deal with missing data, but the application to longitudinal datasets can be done in multiple ways. When a new wave of longitudinal data arrives, we can treat the combined data of multiple waves as a new missing data problem and overwrite existing imputations with new values (re-imputation). Alternatively, we may keep the existing imputations, and impute only the new data. We may do either a full multiple imputation (nested) or a single imputation (appended) on the new data per imputed set. This study compares these three strategies by means of simulation. All techniques resulted in valid inference under a monotone missingness pattern. A non-monotone missingness pattern led to biased and non-confidence valid regression coefficients after nested and appended imputation, depending on the correlation structure of the data. Correlations within timepoints must be stronger than correlations between timepoints to obtain valid inference. In an empirical example, the three strategies performed similarly. We conclude that appended imputation is especially beneficial in longitudinal datasets that suffer from dropout.

arXiv preprint arXiv:1904.04185
Stef van Buuren
Stef van Buuren

My research interests include data science, missing data, child growth and development, and measurement.