7.3 Missing values in multilevel data

In single-level data, missing values may occur in the outcome, in the predictors, or in both. The situation for multilevel data is more complex. Missing values in the measured variables of the multilevel model can occur in

the outcome variable;
the level-1 predictors;
the level-2 predictors;
the class variable.

This chapter assumes that the class variable is always completely observed. In real life, this may not be the case and techniques detailed in Chapter 3 can be used to impute class membership. See Hill and Goldstein (1998) and Goldstein (2011 b) for models to handle missing class identification.

7.3.1 Practical issues in multilevel imputation

In single-level models, the impact of the missing values on the analysis depends on where in the model they occur. This is also the case in multilevel analysis. In fact, the multilevel model is very well equipped to handle missing values in the outcomes. Missing values in the predictors are generally more difficult to handle. Some children may have missing values for the age of the child, occupational status of the father, ethnic background, and so on. In longitudinal applications, missing data may occur in time-varying covariates, like nutritional status and stage of pubertal development. Most mixed methods cannot handle such missing values, and will remove children with any missing values in the level-1 predictors prior to analysis.

Missing data in the level-2 predictors occur if, for example, it is not known whether a school is public or private. In a longitudinal setting, missing data in fixed person characteristics, like sex or education, lead to incomplete level-2 predictors. The consequences of such missing values can be even larger. The typical fix is to delete all records in the class. For example, suppose that the model contains the professional qualification of the teacher. If the qualification is missing, the data of all pupils in the class are disregarded.

Many multilevel models define derived variables as part of the analysis, like the cluster means of a level-1 predictor, the product of two level-1 predictors, the dummy-coded version of a categorical variable, the disaggregated version of a level-2 predictor, and so on. We can calculate such derived variables from the data and include them into the model as needed, but of course this is only possible when data are complete. Although the derived variables themselves need not be imputed (because we can always recalculate them from the imputed data), the imputation model needs to be aware of, and account for, the role that such derived variables play in the complete-data model.

In practice, complications may arise due to the nature of the data or model. Some of these are as follows:

For small clusters the within-cluster mean and variance are unreliable estimates, so the choice of the prior distribution becomes critical.
For a small number of clusters, it is difficult to estimate the between-cluster variance of the random effects.
In applications with systematically missing data, there are no observed values in the cluster, so the cluster location cannot be estimated.
The variation of the random slopes can be large, so the method used to deal with the missing data should account for this.
The error variance \(\sigma_\epsilon^2\) may differ across clusters (heteroscedasticity)
The residual error distributions can be far from normal.
The model contains aggregates of the level-1 variables, such as cluster means, which need to be taken in account during imputation.
The model contains interactions, or other nonlinear terms.
The multilevel model may be very complex, it may not be possible to fit the model, or there are convergence problems.

Table 7.1: Questions to gauge the complexity of a multilevel imputation task.
1.	Will the complete-data model include random slopes?
2.	Will the data contain systematically missing values?
3.	Will the distribution of the residuals be non-normal?
4.	Will the error variance differ over clusters?
5.	Will there be small clusters?
6.	Will there be a small number of clusters?
7.	Will the complete-data model have cross-level interactions?
8.	Will the dataset be very large?

There is not one super-method that will address all such issues. In practice, we may need to emphasize certain issues at the expense of others. In order to gauge the complexity of the imputation task for particular dataset and model, ask yourself the questions listed in Table 7.1. If your answer to all questions is “NO”, then there are several methods for multilevel MI that are available in standard software. If many of your answers are “YES”, the situation is less clear-cut, and you may need to think about the relative priority of the questions in light of the needs for the application.

7.3.2 Ad-hoc solutions for multilevel data

Missing values in the level-1 predictors or the level-2 predictors have long been treated by listwise deletion. This is easy to do, but may have severe adverse effects, especially for missing values in level-2 predictors. For example, we may not know whether a school is public or private. Ignoring all records pertaining to that school is not only wasteful, but may also lead to selection effects at cluster level. While listwise deletion could be useful when the variance of the slopes is large, it is not generally recommended (Grund, Lüdtke, and Robitzsch 2016 a).

Another ad-hoc solution is to ignore the clustering and impute the data by a single-level method. It is known that this will underestimate the intra-class correlation (Taljaard, Donner, and Klar 2008; Van Buuren 2011; Enders, Mistler, and Keller 2016). Lüdtke, Robitzsch, and Grund (2017) derived an expression for the asymptotic bias for the intra-class correlation under the random intercept model. The amount of underestimation grows with the ICC and with the missing data rate. Increasing the cluster size hardly aids in reducing this bias. In addition, the regression weights for the fixed effects will be biased. Grund, Lüdtke, and Robitzsch (2018 b) conclude that single-level imputation should be avoided unless only a few cases contain missing data (e.g., less than 5%) and the intra-class correlation is low (e.g., less than .10). Conducting multiple imputation with the wrong model (e.g., single-level methods) can be more hazardous than listwise deletion.

Another ad-hoc technique is to add a dummy variable for each cluster, so that the model estimates a separate coefficient for each cluster. The coefficients are estimated by ordinary least squares, and the parameters are drawn from their posteriors. If the missing values are restricted to the outcome, this method will estimate the fixed effects quite well, but also artificially inflates the true variation between groups, and thus biases the ICC upwards (Andridge 2011; Van Buuren 2011; Graham 2012). If there are also missing values in the predictors, the level-1 regression weights will be unbiased, but the level-2 weights are biased, in particular for small clusters and low ICC. See Lüdtke, Robitzsch, and Grund (2017) for more detail, who also derive the asymptotic bias. If the primary interest is on the fixed effects, adding a cluster dummy is an easily implementable alternative, unless the missing rate is very large and/or the intra-class correlation is very low and the number of records in the cluster is small (Drechsler 2015; Lüdtke, Robitzsch, and Grund 2017). Since the bias in random slopes and variance components can be substantial, one should turn to multilevel imputation to obtain proper estimates of those parts of the multilevel model (Speidel, Drechsler, and Sakshaug 2017).

Vink, Lazendic, and Van Buuren (2015) described an application of Australian school data with over 2.8 million records, where a dummy variable per school was combined with predictive mean matching. Given the size and complexity of the imputation problem, this application would have been computationally infeasible with full multilevel imputation. Thus, for large databases, adding a dummy variable per cluster is a practical and useful technique for estimating the fixed effects.

7.3.3 Likelihood solutions

The multilevel model is actually “made to solve” the problem of missing values in the outcome. There is an extensive literature, especially for longitudinal data (Verbeke and Molenberghs 2000; Molenberghs and Verbeke 2005; Daniels and Hogan 2008). For more details, see the encyclopaedic overview in Fitzmaurice et al. (2009). Multilevel models have the ability to handle models with varying time points, which is an advance over traditional repeated-measures ANOVA, where the usual treatment is to remove the entire case if one of the outcomes is missing. Multilevel models do not assume an equal number of occasions or fixed time points, so all cases can be used for analysis.

Missing outcome data are easily handled in modern likelihood-based methods. Snijders and Bosker (2012, 56) write that the model “can even be applied if some groups have sample size 1, as long as other groups have greater sizes.” Of course, this statement will only go as far as the assumptions of the model are met: the data are missing at random and the model is correctly specified.

Mixed-effects models can be fit with maximum-likelihood methods, which take care of missing data in the dependent variable. This principle can be extended to address missing data in explanatory variables in (multilevel) software for structural equation modeling like Mplus (Muthén, Muthén, and Asparouhov 2016) and gllamm (Rabe-Hesketh, Skrondal, and Pickles 2002). Grund, Lüdtke, and Robitzsch (2018 b) remarked that such extensions could alter the meaning and value of the parameters of interest.