pool() function combines the estimates from
repeated complete data analyses. The typical sequence of steps to
do a multiple imputation analysis is:
Impute the missing data by the
mice function, resulting in
a multiple imputed data set (class
Fit the model of interest (scientific model) on each imputed data set
with() function, resulting an object of class
Pool the estimates from each model into a single set of estimates
and standard errors, resulting is an object of class
A common error is to reverse steps 2 and 3, i.e., to pool the
multiply-imputed data instead of the estimates. Doing so may severely bias
the estimates of scientific interest and yield incorrect statistical
intervals and p-values. The
pool() function will detect
pool(object, dfcom = NULL)
A positive number representing the degrees of freedom in the
complete-data analysis. The default (
An object of class
mipo, which stands for 'multiple imputation
pool() function averages the estimates of the complete
data model, computes the
total variance over the repeated analyses by Rubin's rules
(Rubin, 1987, p. 76),
and computes the following diagnostic statistics per estimate:
Relative increase in variance due to nonresponse
Residual degrees of freedom for hypothesis testing
Proportion of total variance due to missingness
Fraction of missing information
The function requires the following input from each fitted model:
the estimates of the model, usually obtainable by
the standard error of each estimate;
the residual degrees of freedom of the model.
The degrees of freedom calculation uses the Barnard-Rubin adjustment for small samples (Barnard and Rubin, 1999).
Barnard, J. and Rubin, D.B. (1999). Small sample degrees of freedom with multiple imputation. Biometrika, 86, 948-955.
Rubin, D.B. (1987). Multiple Imputation for Nonresponse in Surveys. New York: John Wiley and Sons.
van Buuren S and Groothuis-Oudshoorn K (2011).
Imputation by Chained Equations in
R. Journal of Statistical
Software, 45(3), 1-67. https://www.jstatsoft.org/v45/i03/
# pool using the classic MICE workflow imp <- mice(nhanes, maxit = 2, m = 2)#> #> iter imp variable #> 1 1 bmi hyp chl #> 1 2 bmi hyp chl #> 2 1 bmi hyp chl #> 2 2 bmi hyp chl#> term estimate std.error statistic df p.value #> 1 (Intercept) 21.3937741 4.22217002 5.0670091 17.192034 9.211353e-05 #> 2 hyp -1.4382464 2.56874915 -0.5599014 4.912371 6.001139e-01 #> 3 chl 0.0346279 0.02516489 1.3760400 6.688939 2.131088e-01