The pool() function combines the estimates from m repeated complete data analyses. The typical sequence of steps to do a multiple imputation analysis is:

  1. Impute the missing data by the mice function, resulting in a multiple imputed data set (class mids);

  2. Fit the model of interest (scientific model) on each imputed data set by the with() function, resulting an object of class mira;

  3. Pool the estimates from each model into a single set of estimates and standard errors, resulting is an object of class mipo;

  4. Optionally, compare pooled estimates from different scientific models by the pool.compare() function.

A common error is to reverse steps 2 and 3, i.e., to pool the multiply-imputed data instead of the estimates. Doing so may severely bias the estimates of scientific interest and yield incorrect statistical intervals and p-values. The pool() function will detect this case.

pool(object, dfcom = NULL)

Arguments

object

An object of class mira (produced by with.mids() or as.mira()), or a list with model fits.

dfcom

A positive number representing the degrees of freedom in the complete-data analysis. The default (dfcom = NULL) is to extract this information from the first fitted model. When that fails the warning "Large sample assumed" is printed, and the parameter is set dfcom = 999999. Use the dfcom parameter to specify the correct degrees of freedom.

Value

An object of class mipo, which stands for 'multiple imputation pooled outcome'.

Details

The pool() function averages the estimates of the complete data model, computes the total variance over the repeated analyses by Rubin's rules (Rubin, 1987, p. 76), and computes the following diagnostic statistics per estimate:

  1. Relative increase in variance due to nonresponse r;

  2. Residual degrees of freedom for hypothesis testing df;

  3. Proportion of total variance due to missingness lambda;

  4. Fraction of missing information fmi.

The function requires the following input from each fitted model:

  1. the estimates of the model, usually obtainable by coef()

  2. the standard error of each estimate;

  3. the residual degrees of freedom of the model.

The pool() function relies on the broom::tidy and broom::glance function for extracting this information from a list of fitted models.

The degrees of freedom calculation uses the Barnard-Rubin adjustment for small samples (Barnard and Rubin, 1999).

References

Barnard, J. and Rubin, D.B. (1999). Small sample degrees of freedom with multiple imputation. Biometrika, 86, 948-955.

Rubin, D.B. (1987). Multiple Imputation for Nonresponse in Surveys. New York: John Wiley and Sons.

van Buuren S and Groothuis-Oudshoorn K (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3), 1-67. https://www.jstatsoft.org/v45/i03/

See also

Examples

# pool using the classic MICE workflow imp <- mice(nhanes, maxit = 2, m = 2)
#> #> iter imp variable #> 1 1 bmi hyp chl #> 1 2 bmi hyp chl #> 2 1 bmi hyp chl #> 2 2 bmi hyp chl
fit <- with(data = imp, exp = lm(bmi ~ hyp + chl)) summary(pool(fit))
#> estimate std.error statistic df p.value #> (Intercept) 23.32410355 4.53557796 5.1424766 14.48526 4.792246e-05 #> hyp -1.87985416 1.94710003 -0.9654636 18.50412 3.457147e-01 #> chl 0.02851159 0.01956345 1.4573906 20.22748 1.603594e-01