Partioned predictive mean matching as a large data multilevel imputation technique


Large scale assessment data often has a multilevel structure. When dealing with missing values, such structures need to be taken into account to prevent underestimation of the intraclass correlation. We evaluate predictive mean matching (PMM) as a multilevel imputation technique and compare it to other imputation approaches for multilevel data. We propose partitioned predictive mean matching (PPMM) as an extension to the PMM algorithm to divide the big data multilevel problem into manageable parts that can be solved by standard predictive mean matching. We show that PPMM can be a very effective imputation approach for large multilevel datasets and that both PPMM and PMM yield plausible inference for continuous, ordered categorical, or even dichotomous multilevel data. We conclude that both the performance of PMM and PPMM is often comparable to dedicated methods for multilevel data.

Psychological Test and Assessment Modeling
Stef van Buuren
Stef van Buuren

My research interests include data science, missing data, child growth and development, and measurement.