5.5 Parallel computation
Multiple imputation is a parallel technique. If there are \(m\) processors available, it is possible to generate the \(m\) imputed datasets, estimate the \(m\) complete-data statistics and store the \(m\) results by \(m\) independent parallel streams. The overhead needed is minimal since each stream requires the same amount of processor time. If more than \(m\) processors are available, a better alternative is to subdivide each stream into several substreams. Huge savings in execution time can be obtained in this way (Beddo 2002).
Unfortunately, R
is single-threaded, so the exploitation of the parallel nature of multiple imputation is not automatic, and requires some additional work. There are currently three alternatives to perform the calculation of mice
in a parallel fashion.
Gordon (2014) presents a fully worked out example code that builds upon the
doParallel
library, and that combinescomplete()
andibind()
. With some programming this example can be adapted to other datasets.The
parlMICE()
function is a wrapper aroundmice()
that can divide the imputations over multiple cores or CPUs. Schouten and Vink (2017) show that substantial gains are already possible with three free cores, especially for a combination of a large number of imputations \(m\) and a large sample size \(n\).The
par.mice()
function in themicemd
package (Audigier and Resche-Rigon 2018) takes the same arguments as themice()
function, plus two extra arguments related to the parallel calculations. It also builds on theparallel
package.
The last two options are quite similar. Application of these methods is especially beneficial for simulation studies, where the same model needs to be replicated a large number of times. Support for multi-core processing is likely to grow, so keep an eye on the Internet.