## 4.3 Monotone data imputation

### 4.3.1 Overview

Imputations of monotone missing data can be generated by specifying a sequence of univariate methods (one for each incomplete column), followed by drawing sequentially synthetic observations under each method. Suppose that variables \(Y_1,\dots,Y_p\) are ordered into a monotone missing data pattern. The general recommended procedure is as follows (Rubin 1987b, 172). The missing values of \(Y_1\) are imputed from a (possibly empty) set of complete covariates \(X\) ignoring \(Y_2,\dots,Y_p\). Next, the missing values of \(Y_2\) are imputed from \((Y_1,X)\) ignoring \(Y_3,\dots,Y_p\), and so on. The procedure ends after \(Y_p\) is imputed from \((X,Y_1,\dots,Y_{p-1})\). The univariate imputation methods as discussed in Chapter 3 can be used as building blocks. For example, \(Y_1\) can be imputed by logistic regression, \(Y_2\) by predictive mean matching and so on.

*Numerical example*. The first three columns of the data frame `nhanes2`

in `mice`

have a monotone missing data pattern. In terms of the above notation, \(X\) contains the complete variable `age`

, \(Y_1\) is the variable `hyp`

and \(Y_2\) is the variable `bmi`

. Monotone data imputation can be applied to generate \(m=2\) complete datasets by:

```
data <- nhanes2[, 1:3]
md.pattern(data, plot = FALSE)
```

```
age hyp bmi
16 1 1 1 0
1 1 1 0 1
8 1 0 0 2
0 8 9 17
```

```
imp <- mice(data, visit = "monotone", maxit = 1, m = 2,
print = FALSE)
```

The `md.pattern()`

function outputs the three available data patterns in `data`

. There are 16 complete rows, one row with missing `bmi`

, and eight rows where both `bmi`

and `hyp`

are missing. The argument `visit = monotone`

specifies that the visit sequence should be equal to the number of missing data per variable (so first `hyp`

and then `bmi`

). Since one iteration is enough, we use `maxit = 1`

to limit the calculations. This code imputes `hyp`

by logistic regression and `bmi`

by predictive mean matching, the default methods for binary and continuous data, respectively.

Monotone data imputation requires that the missing data pattern is monotone. In addition, there is a second, more technical requirement: the parameters of the imputation models should be *distinct* (Rubin 1987b, 174–78). Let the \(j^\mathrm{th}\) imputation model be denoted by \(P(Y_j^\mathrm{mis}|X,Y_1,\dots,Y_{p-1},\phi_j)\), where \(\phi_j\) represents the unknown parameters of the imputation model. For valid likelihood inferences, \(\phi_1,\dots,\phi_p\) should be distinct in the sense that the parameter space \(\phi = (\phi_1,\dots,\phi_p)\) in the multivariate model for the data is the cross-product of the individual parameter spaces (Schafer 1997, 219). For Bayesian inference, it is required that the prior density of all parameters \(\pi(\phi)\) factors into \(p\) independent densities \(\pi(\phi) = \pi_1(\phi_1)\pi_2(\phi_2),\dots,\pi_p(\phi_p)\) (Schafer 1997, 224). In most applications these requirements are unlikely to limit the practical usefulness of the method because the parameters are typically unrelated and allowed to vary freely. We need to be aware, however, that monotone data imputation may fail if the parameters of imputation models for different \(Y_j\) somehow depend on each other.

### 4.3.2 Algorithm

**Algorithm 4.1 (Monotone data imputation of multivariate missing data.\(^\spadesuit\)) **

Sort the data \(Y_j^\mathrm{obs}\) with \(j=1,\dots,p\) according to their missingness.

Draw \(\dot\phi_1 \sim P(Y_1^\mathrm{obs}|X)\).

Impute \(\dot Y_1 \sim P(Y_1^\mathrm{mis}|X,\dot\phi_1)\).

Draw \(\dot\phi_2 \sim P(Y_2^\mathrm{obs}|X,\dot Y_1)\).

Impute \(\dot Y_2 \sim P(Y_1^\mathrm{mis}|X,\dot Y_1,\dot\phi_2)\).

\(\vdots\)

Draw \(\dot\phi_p \sim P(Y_p^\mathrm{obs}|X,\dot Y_1,\dots,\dot Y_{p-1})\).

- Impute \(\dot Y_p \sim P(Y_p^\mathrm{mis}|X,\dot Y_1,\dots,\dot Y_{p-1},\dot\phi_p)\).

Algorithm 4.1 provides the main steps of monotone data imputation. We order the variables according to their missingness, and impute from left to right. In practice, a pair of “draw-impute” steps is executed by one of the univariate methods of Chapter 3. Both Bayesian sampling and bootstrap imputation methods can be used, and can in fact be mixed. There is no need to iterate, and convergence is immediate. The algorithm is replicated \(m\) times from different starting points to obtain \(m\) multiply imputed datasets.

Monotone data imputation is fast and flexible, but requires a monotone pattern. In practice, a dataset may be near-monotone, and may become monotone if a small fraction of the missing data were imputed. For example, some subjects may drop out of the study resulting in a monotone pattern. There could be some unplanned missing data that destroy the monotone pattern. In such cases it can be computationally efficient to impute the data in two steps. First, fill in the missing data in a small portion of the data to restore the monotone pattern, and then apply the monotone data imputation (Li 1988; Rubin and Schafer 1990; Liu 1993; Schafer 1997; Rubin 2003). There are often more ways to impute toward monotonicity, so a choice is necessary. Rubin and Schafer (1990) suggested ordering the variables according to the missing data rate.

*Numerical example*. The `nhanes2`

data in `mice`

contains 3 out of 27 missing values that destroy the monotone pattern: one for `hyp`

(in row 6) and two for `bmi`

(in rows 3 and 6). The following algorithm first imputes these 3 values by a simple random sample, and then fills in the remaining missing data by monotone data multiple imputation.

The primary advantage is speed. We need to make only two passes through the data. Since the method uses single imputation in the first step, it should be done only if the number of missing values that destroy the monotone pattern is small.

Observe that the imputed values for the missing `hyp`

data in row 3 could also depend on `bmi`

and `chl`

, but in the procedure both predictors are ignored. In principle, we can improve the method by incorporating `bmi`

and `chl`

into the model, and then iterate. We will explore this technique in more detail in Section 4.5, but first we study the theoretically nice alternative.