## 2.4 Statistical intervals and tests

### 2.4.1 Scalar or multi-parameter inference?

The ultimate objective of multiple imputation is to provide valid statistical estimates from incomplete data. For scalar $$Q$$, it is straightforward to calculate confidence intervals and $$p$$-values from multiply imputed data, the primary difficulty being the derivation of the appropriate degrees of freedom for the $$t$$- and $$F$$-distributions . Section 2.4.2 provides the relevant statistical procedures.

If $$Q$$ is a vector, we have two options for analysis. The first option is to calculate confidence intervals and $$p$$-values for the individual elements in $$Q$$, and do all statistical tests per element. Such repeated-scalar inference is appropriate if we interpret each element as a separate, though perhaps related, model parameter. In this case, the test uses the fraction of missing information particular to each parameter.

The alternative option is to perform one statistical test that involves the elements of $$Q$$ at once. This is appropriate in the context of multi-parameter or simultaneous inference, where we evaluate combinations of model parameters. Practical applications of such tests include the comparison of nested models and the testing of model terms that involved multiple parameters like regression estimates for dummy codings created from the same variable.

All methods assume that, under repeated sampling and with complete data, the parameter estimates $$\hat Q$$ are normally distributed around the population value $$Q$$ as

$$$\hat Q \sim N(Q, U) \tag{2.33}$$$

where $$U$$ is the variance-covariance matrix of $$(Q-\hat Q)$$ (Rubin 1987b, 75). For scalar $$Q$$, the quantity $$U$$ reduces to $$\sigma_m^2$$, the variance of the estimate $$\hat Q$$ over repeated samples. Observe that $$U$$ is not the variance of the measurements.

Several approaches for multi-parameter inference are available: Wald test, likelihood ratio test and $$\chi^2$$-test. These methods are more complex than single-parameter inference, and their treatment is therefore deferred to Section 5.2. The next section shows how confidence intervals and $$p$$-values for scalar parameters can be calculated from multiply imputed data.

### 2.4.2 Scalar inference

Single parameter inference applies if $$k=1$$, or if $$k>1$$ and the test is repeated for each of the $$k$$ components. Since the total variance of $$T$$ is not known a priori, $$\bar Q$$ follows a $$t$$-distribution rather than the normal. Univariate tests are based on the approximation

$$$\frac{Q-\bar Q}{\sqrt{T}} \sim t_\nu \tag{2.34}$$$

where $$t_\nu$$ is the Student’s $$t$$-distribution with $$\nu$$ degrees of freedom, with $$\nu$$ defined by Equation (2.32).

The $$100(1-\alpha)$$% confidence interval of a $$\bar Q$$ is calculated as

$$$\bar Q \pm t_{\nu,1-\alpha/2}\sqrt{T} \tag{2.35}$$$

where $$t_{\nu,1-\alpha/2}$$ is the quantile corresponding to probability $$1-\alpha/2$$ of $$t_\nu$$. For example, use $$t_{10,0.975}=2.23$$ for the 95% confidence interval with $$\nu=10$$.

Suppose we test the null hypothesis $$Q=Q_0$$ for some specified value $$Q_0$$. We can find the $$p$$-value of the test as the probability

$$$P_s = \Pr\left[F_{1,\nu} > \frac{(Q_0 - \bar Q)^2}{T}\right] \tag{2.36}$$$

where $$F_{1,\nu}$$ is an $$F$$ where $$F_{1,\nu}$$ is an $$F$$-distribution with 1 and $$\nu$$ degrees of freedom.

### 2.4.3 Numerical example

Wald tests and confidence intervals for individual elements of $$Q$$ are standard output of most statistical procedures. The mice package provides such output by running the summary() function on the mipo object created by pool():

summary(est, conf.int = TRUE)
estimate std.error statistic   df        p.value
(Intercept)    30.50      2.24     13.63 12.4 0.000000000694
age            -2.13      1.08     -1.97 15.1 0.067575739870
2.5 % 97.5 %
(Intercept) 25.65 35.362
age         -4.43  0.174

The estimate and df columns are identical to the previous display. In addition, we get the standard error of the estimate, the Wald statistics, its associated $$p$$-value, and the nominal 95th percent confidence interval per parameter. In this toy example age is not a statistically significant predictor of bmi at a type I error rate of 5 percent. We may change the nominal length of the confidence intervals by the conf.level argument. It is possible to obtain all output by summary(est, all, conf.int = TRUE).