5.5 Parallel computation
Multiple imputation is a parallel technique. If there are \(m\) processors available, it is possible to generate the \(m\) imputed datasets, estimate the \(m\) complete-data statistics and store the \(m\) results by \(m\) independent parallel streams. The overhead needed is minimal since each stream requires the same amount of processor time. If more than \(m\) processors are available, a better alternative is to subdivide each stream into several substreams. Huge savings in execution time can be obtained in this way (Beddo 2002).
R is single-threaded, so the exploitation of the parallel nature of multiple imputation is not automatic, and requires some additional work. There are currently three alternatives to perform the calculation of
mice in a parallel fashion.
Gordon (2014) presents a fully worked out example code that builds upon the
doParallellibrary, and that combines
ibind(). With some programming this example can be adapted to other datasets.
parlMICE()function is a wrapper around
mice()that can divide the imputations over multiple cores or CPUs. Schouten and Vink (2017) show that substantial gains are already possible with three free cores, especially for a combination of a large number of imputations \(m\) and a large sample size \(n\).
par.mice()function in the
micemdpackage (Audigier and Resche-Rigon 2018) takes the same arguments as the
mice()function, plus two extra arguments related to the parallel calculations. It also builds on the
The last two options are quite similar. Application of these methods is especially beneficial for simulation studies, where the same model needs to be replicated a large number of times. Support for multi-core processing is likely to grow, so keep an eye on the Internet.