7.2 Notation for multilevel models

Multilevel data have a hierarchical, or clustered, structure. The archetypical example is data from pupils who are nested within classes. Some of the data may relate to pupils (e.g., test scores), whereas other data concern the class level (e.g., class size). Another example arises in longitudinal studies, where the individual’s responses over time are nested within individuals. Some of the data vary with time (e.g., disease history), whereas other data vary between individuals (e.g., sex). The term multilevel analysis refers to the methodology to analyze data with such multilevel structure, a methodology that can be traced back to the definition of the intra-class correlation (ICC) by Fisher (1925). Multilevel analysis is quite different from methods for single-level data. The analysis of multilevel data is a vast topic, and this is not the place to cover the model in detail. There are excellent introductions by Raudenbush and Bryk (2002), Gelman and Hill (2007), Snijders and Bosker (2012), Fitzmaurice, Laird, and Ware (2011), and Hox, Moerbeek, and Van de Schoot (2018). This chapter assumes basic familiarity with these models.

A challenging aspect of multilevel analysis is the existence of a variety of notational systems and concepts. This section describes three different notations. In order to illustrate these notations, we use data on school performance of grade 8 pupils in Dutch schools. These data were collected by Brandsma and Knuver (1989), and were used as the primary examples in Chapters 4 and 5 of Snijders and Bosker (2012). The data are available as the brandsma object in the mice package. The data contain a mix of both pupil-level measurements and school-level measurements.

library(mice)
data("brandsma", package = "mice")
d <- brandsma[, c("sch", "lpo", "sex", "den")]

Let us concentrate on four variables, each representing a different role in the multilevel model:

School number, cluster variable;
Language test post, outcome at pupil level;
Sex of pupil, predictor at pupil level;
School denomination, predictor at school level.

The scientific interest is to create a model for predicting the outcome lpo from the level-1 predictor sex (coded as 0-1) and the level-2 predictor den (which takes values 1-4). Let the data be divided into \(C\) clusters (e.g., classes, schools), indexed by \(c\) (\(c=1,\dots,C\)). Each cluster holds \(n_c\) units, indexed by \(i=1,\dots,n_c\). There are three ways to write the same model (Scott, Shrout, and Weinberg 2013).

In level notation, introduced by Bryk and Raudenbush (1992), we formulate a multilevel model as a system of two equations, one at level-1, and two at level-2:

\[\begin{align} {{\texttt{lpo}}}_{ic} & = \beta_{0c} + \beta_{1c}{{\texttt{sex}}}_{ic} + \epsilon_{ic}\tag{7.1}\\ \beta_{0c} & = \gamma_{00} + \gamma_{01}{{\texttt{den}}}_{c} + u_{0c}\tag{7.2}\\ \beta_{1c} & = \gamma_{10}\tag{7.3} \end{align}\]

where \({{\texttt{lpo}}}_{ic}\) is the test score of pupil \(i\) in school \(c\), where \({{\texttt{sex}}}_{ic}\) is the sex of pupil \(i\) in school \(c\), and where \({{\texttt{den}}}_c\) is the religious denomination of school \(c\). Note that here the subscripts distinguish the level-1 and level-2 variables. In this notation, \({{\texttt{sex}}}_{ic}\) only appears in the level-1 model (7.1), and \({{\texttt{den}}}_c\) only appears in the level-2 model (7.2). The term \(\beta_{0c}\) is a random intercept that varies by cluster, while \(\beta_{1c}\) is a sex effect that is assumed to be the same across schools. The term \(\epsilon_{ic}\) is the within-cluster random residual at the pupil level with a normal distribution \(\epsilon_{ic} \sim N(0, \sigma_{\epsilon}^2)\). The first level-2 model describes the variation in the mean test score between schools as a function of the grand mean \(\gamma_{00}\), a school-level effect \(\gamma_{01}\) of denomination and a school-level random residual with a normal distribution \(u_{0c} \sim N(0, \sigma_{u_0}^2)\). The second level-2 model does not have a random residual, so this specifies that \(\beta_{1c}\) is a fixed effect equal in value to \(\gamma_{10}\). The unknowns to be estimated are the fixed parameters \(\gamma_{00}\), \(\gamma_{01}\) and \(\gamma_{10}\), and the variance components \(\sigma_{\epsilon}^2\) and \(\sigma_{u_0}^2\).

We may write the same model as a single predictive equation by substituting the level-2 models into the level-2 model:

\[ \tag{7.4} {{\texttt{lpo}}}_{ic} = \gamma_{00} + \gamma_{10}{{\texttt{sex}}}_{ic} + \gamma_{01}{{\texttt{den}}}_{c} + u_{0c} + \epsilon_{ic}, \]

We do not need the double subscripts any more, so we write the model in composite notation as

\[ \tag{7.5} {{\texttt{lpo}}}_{ic} = \beta_0 + \beta_1{{\texttt{sex}}}_{ic} + \beta_2{{\texttt{den}}}_{c} + u_{0c} + \epsilon_{ic}, \]

Note that these \(\beta\)’s are fixed effects and the \(\beta\)’s in the level-1 model (7.1) are random effects. They differ by the number of subscripts.

The same model written in matrix notation is widely known as the linear mixed effects model (Laird and Ware 1982) and can be written as

\[ y_c = X_c\beta + Z_cu_c + \epsilon_c \tag{7.6} \]

where \(y_c = {{\texttt{lpo}}}_c\) is a column vector containing the scores in cluster \(c\), where \(X_c = ({{\texttt{1}}}, {{\texttt{den}}}_c, {{\texttt{sex}}}_c)\) is the \(n_c \times 3\) design matrix in class \(c\) associated with the fixed effects, and where \(Z_c = ({{\texttt{1}}})\) is a column with \(n_c\) 1’s associated with the random intercept \(u_{0c}\). In the general model, \(u_c\) has length \(q\) and consists of both random intercepts and random slopes, which are normally distributed as \(u_c \sim N(0,\Omega)\). The \(q\) random effects are typically a subset of the \(p\) fixed effects, so \(q \leq p\).

The different formulations of the multilevel model reflect different scientific traditions. The matrix model formulation is favoured among statisticians because all multilevel models can be economically expressed by Equation (7.6), which eases estimation of parameters, statistical inference and prediction. The books by McCulloch and Searle (2001), Verbeke and Molenberghs (2000), De Leeuw and Meijer (2008) and Fitzmaurice, Laird, and Ware (2011) are representative for this tradition. The level formulation clearly separates the roles of level-1 and level-2 variables, which eases interpretation of the model. Also, the error structure in each equation is simpler than in the two other formulations. The level formulation was popularized by Bryk and Raudenbush (1992), and is common in the social sciences. The composite formulation covers the middle ground. It gives a balance between compactness and clarity, and is used by the introductions into multilevel analysis for applied researchers by Snijders and Bosker (2012) and Hox, Moerbeek, and Van de Schoot (2018). Gelman and Hill (2007) provide a Bayesian approach to multilevel analysis using a related though slightly different terminology. See Fitzmaurice, Laird, and Ware (2011, 203–8, Ch. 22) and Scott, Shrout, and Weinberg (2013) for more detail on the relations between the terminologies.

Although the models are mathematically equivalent, the notation has implications on the ease with which imputation models can be specified. Equation (7.6) nicely separates the fixed \(X_c\) from random effects \(Z_c\), but the same covariates may appear in both \(X_c\) and \(Z_c\). This complicates imputation of those covariates in an FCS framework because the same variable appears two times in the model. We surely want to prevent the scenario where imputed versions of the same covariate would become different in \(X_c\) and \(Z_c\). The level formulation distinguishes level-1 variables from level-2 predictors. No overlap occurs between the sets of variables, so level formulation is natural for developing imputation procedures for variables at different levels. Likewise, the composite notation makes it easier to see how the imputation models must be set up in FCS, and is natural for studying the effect of interactions.