## 7.6 Continuous outcome

This section discusses the problem of how to create multiple imputations under the multilevel model when missing values occur in the outcome \(y_{c}\) only.

### 7.6.1 General principle

Imputations under the linear mixed model in Equation (7.6) can be generated by taking draws from posterior distribution of the modeled parameters, followed by a step to draw the imputations. The sequence of steps is:

Fit model (7.6) to the observed data to obtain estimates \(\hat\sigma^2\), \(\hat\beta\), \(\hat\Omega\) and \(\hat u_c\);

Generate random draws \(\dot\sigma^2\), \(\dot\beta\), \(\dot\Omega\) and \(\dot u_c\) from their posteriors;

For each missing value, calculate its expected value under the drawn model, and add a randomly drawn error to it.

These steps form a template for any multilevel imputation method. For step 1 we may use standard multilevel software. Step 2 is needed to properly account for the uncertainty of the model, and this generally requires custom software that generates the draws. Alternatively, we may use bootstrapping to incorporate model uncertainty, although this is more complex than usual since resampling is needed at two levels (Goldstein 2011a). Once we have obtained the parameter draws, generating the imputation is straightforward. It is also possible to use predictive mean matching in step 3.

We illustrate these steps by the method proposed by Jolani (2018) for imputing mixes of systematically and sporadically missing values. Step 1 of that method consists of calling the `lmer()`

from the `lme4`

package to fit the model. Step 2 draws successively \(\dot\sigma^2\), \(\dot\beta\), \(\dot\Omega\) under a normal-inverse-Wishart prior for \(\Omega\), and \(\dot u_c\) from the conditional normal model for sporadically missing data, and from an unconditional normal model for systematically missing values. See the paper for the exact specification of these steps. The expected value of the missing entry is then calculated, and a random draw from the residual error distribution is added to create the imputation. These steps are implemented as the `2l.lmer`

method in `mice`

.

Predictive mean matching works well for single-level continuous data, and is also an interesting option for imputing multilevel data. The idea is to calculate predicted values under the linear mixed model for all level-1 units. Level-1 units with observed outcomes are selected as potential donors depending on the distance in the predictive metric. It is up to the imputer to specify whether or not donors should be restricted to inherit from the same cluster. Drawing values inside the cluster may preserve heteroscedasticity better than taking donors from all clusters, which should strengthen the homoscedasticity assumption. So if preserving such heterogeneity is important, draws should be made locally. Intuitively, drawing from one’s own cluster should be done only if the cluster is relatively large, so that the procedure can find enough good matches. If different clusters come from different reporting systems, i.e., using centimeters and converted inches, the imputer might wish to preserve such features by restricting draws to the local cluster. If clusters are geographically ordered, then one may try to preserve unmeasured local features by restricting donors to the neighboring clusters. Vink, Lazendic, and Van Buuren (2015) presents an application that exploits this feature.

### 7.6.2 Methods

The `mice`

, `miceadds`

, `micemd`

and `mitml`

packages contain useful functions for multilevel imputation. The `mice`

package implements two methods, `2l.lmer`

and `2l.pan`

. Method `2l.lmer`

(Jolani 2018) imputes both sporadically and systematically missing values. Under the appropriate model, the method is randomization-valid for the fixed effects, but the variance components were more difficult to estimate, especially for a small number of clusters. Method `2l.pan`

uses the PAN method (Schafer and Yucel 2002). Method `2l.continuous`

from `miceadds`

is similar to `2l.lmer`

with some different options. The `2l.jomo`

method from `micemd`

is similar to `2l.pan`

, but uses the `jomo`

package as the computational engine. Method `2l.glm.norm`

is similar to `2l.continuous`

and `2l.lmer`

.

Two functions for heteroscedastic errors are available. A method named `2l.2stage.norm`

from `micemd`

implements the two-stage method by Resche-Rigon and White (2018). The `2l.norm`

method from `mice`

implements the Gibbs sampler from Kasim and Raudenbush (1998). Method `2l.norm`

can recover the intra-class correlation quite well, even for severe MAR cases and high amounts of missing data in the outcome or the predictor. However, it is fairly slow and fails to achieve nominal coverage for the fixed effects for small classes (Van Buuren 2011).

The `2l.pmm`

method in the `miceadds`

package is a generalization of the default `pmm`

method to data with two levels using linear mixed model fitted by `lmer`

or `blmer`

models. Method `2l.2stage.pmm`

generalizes `pmm`

by a two-stage method. The default in both methods is to obtain donors across all clusters, which is probably fine for most applications.

Table 7.2 presents an overview of `R`

functions for univariate imputations according to a multilevel model for continuous outcomes. Each row represents a function. The functions belong to different packages, and there is overlap in functionality. All functions can be called from `mice()`

as building blocks to form an iterative FCS algorithm.

Package | Method | Description |
---|---|---|

Continuous |
||

`mice` |
`2l.lmer` |
normal, `lmer` |

`mice` |
`2l.pan` |
normal, `pan` |

`miceadds` |
`2l.continuous` |
normal, `lmer` , `blme` |

`micemd` |
`2l.jomo` |
normal, `jomo` |

`micemd` |
`2l.glm.norm` |
normal, `lmer` |

`mice` |
`2l.norm` |
normal, heteroscedastic |

`micemd` |
`2l.2stage.norm` |
normal, heteroscedastic |

Generic |
||

`miceadds` |
`2l.pmm` |
pmm, homoscedastic, `lmer` |

`micemd` |
`2l.2stage.pmm` |
pmm, heteroscedastic, `mvmeta` |

### 7.6.3 Example

We use the `brandsma`

data introduced in Section 7.2. Here we will analyze the full set of 4016 pupils. Apart from Chapter 9, Snijders and Bosker (2012) concentrated on the analysis of a reduced set of 3758 pupils. In order to keep things simple, this section restricts the analysis to just two variables.

`d <- brandsma[, c("sch", "lpo")]`

The cluster variable is `sch`

. The variable `lpo`

is the pupil’s test score at grade 8. The cluster variable is complete, but `lpo`

has 204 missing values.

`md.pattern(d, plot = FALSE)`

```
sch lpo
3902 1 1 0
204 1 0 1
0 204 204
```

How do we impute the 204 missing values? Let’s apply the following five methods:

`sample`

: Find imputations by random sampling from the observed values in`lpo`

. This method ignores`sch`

;`pmm`

: Single-level predictive mean matching with the school indicator coded as a dummy variable;`2l.pan`

: Multilevel method using the linear mixed model to draw univariate imputations;`2l.norm`

: Multilevel method using the linear mixed model with heterogeneous error variances;`2l.pmm`

: Predictive mean matching based on predictions from the linear mixed model, with random draws from the regression coefficients and the random effects, using five donors.

The following code block will impute the data according to these five methods.

```
library(miceadds)
methods <- c("sample", "pmm", "2l.pan", "2l.norm", "2l.pmm")
result <- vector("list", length(methods))
names(result) <- methods
for (meth in methods) {
d <- brandsma[, c("sch", "lpo")]
pred <- make.predictorMatrix(d)
pred["lpo", "sch"] <- -2
result[[meth]] <- mice(d, pred = pred, meth = meth,
m = 10, maxit = 1,
print = FALSE, seed = 82828)
}
```

The code `-2`

in the predictor matrix `pred`

signals that `sch`

is the cluster variable. There is only one variable with missing values here, so we do not need to iterate, and can set `maxit = 1`

. The `miceadds`

library is needed for the `2l.pmm`

method.

The `2l.pan`

and `2l.norm`

methods are the oldest multilevel methods. Method `2l.pan`

is very fast, while method `2l.norm`

is more flexible since the within-cluster error variances may differ. To see which of these methods should be preferred for these data, let us study the distribution of the standard deviation of `lpo`

by schools. Figure 7.1 shows that the standard deviation per school varies between 4 and 16, a fairly large spread. This suggests that `2l.norm`

might be preferred here.

Figure 7.2 shows the box plot of the observed data (in blue) and the imputed data (in red) under each of the methods. Box plots are drawn for school with zero missing values, one missing value, two or three missing values and more than three missing values. Pupils in schools with one to three missing values have lower scores than pupils from a school with complete data. Pupils from schools with more than three missing values score similar to pupils from schools with complete data. It is interesting to study how well the different imputation methods preserve this feature in the data.

Method `sample`

does not use any school information, and hence the imputations in all schools look alike. Methods `pmm`

, `2l.pan`

, `2l.norm`

and `2l.pmm`

preserve the pattern, though the differences are less outspoken than in the observed data. Note that the distribution of the two normal methods (`2l.pan`

and `2l.norm`

) have tails that extend beyond the range of the observed data (the maximum is 58). Hence, complete-data estimators based on the tails (e.g., finding the Top 10 Dutch schools) can be distorted by this use of the normal imputation.

Figure 7.3 shows the density plot of the 10 sets of imputed values (red) compared with the density plot of the observed values (blue). The top row corresponds to the `2l.pan`

method, and shows that some parts of the blue curve are not well represented by the imputed values. The method at the bottom row (`2l.pmm`

) tracks the observed data distribution a little better.

Most research to date has concentrated on multilevel imputation using the normal model. In reality, normality is always an approximation, and it depends on the substantive question of how good this approximation should be. Two-level predictive mean matching is a promising alternative that can impute close to the data.