## 8.5 Bibliographic notes

Multiple imputation of the potential outcomes was suggested by Rubin (2004a). Several authors experimented with multiple imputation of potential outcomes, all with the goal of estimating the ACE. Piesse et al. (2010) empirically demonstrated that, with proper adjustments, multiple imputation of potential outcomes in non-randomized experiments can approximate the results of randomized experiments. Bondarenko and Raghunathan (2010) augmented the data matrix with prior information, and showed the sensitivity of the results due to different modelling assumptions. For the randomized design, Aarts, Van Buuren, and Frank (2010) found that multiple imputation of potential outcomes is more efficient than the \(t\)-test and on par with ANCOVA when all usual linear assumptions are met, and better if assumptions were violated. Lam (2013) found that predictive mean matching performed well for imputing potential outcomes. Gutman and Rubin (2015) described a spline-based imputation method for binary data with good statistical properties. Imbens and Rubin (2015) show how the ACE and \(\rho\) are independent, discuss various options of setting \(\rho\) and derive estimates of the ACE. Smink (2016) found that the quality of the ICE estimate depends on the quantile of the realized outcome, and concluded that proper modeling of the correlation between the potential outcomes is needed.

There is a vast class of methods that relate the observed scores \(Y_i\) to covariates \(X\) by least-squares or machine learning methods. These methods are conceptually and analytically distinct from the methods presented in this chapter. Some methods are advertised as estimating individual causal effects, but actually target a different estimand. The relevant literature typically defines individual causal effect as something like

\[ \tilde\tau_i = \mathrm{E}[Y | X = x_i, W_i = 1] - \mathrm{E}[Y | X = x_i, W_i = 0]\tag{8.14} \]

which is the difference between the predicted value under treatment and predicted value under control for each individual. In order to quantify \(\tilde\tau_i\), one needs to estimate the components \(\mathrm{E}[Y | X = x_i, W_i = 1]\) and \(\mathrm{E}[Y | X = x_i, W_i = 0]\) from the data. Now in practice, the set of units \(i \in S_1\) for estimating the first component differs from the set of units \(i \in S_0\) for estimating the second. In that case, \(\tilde\tau_i\) takes the expectation over different sets of units, so \(\tilde\tau_i\) reflects not only the treatment effect, but also any effects that arise because the units in \(S_1\) and \(S_0\) are different, and even mutually exclusive. This violates the critical requirement for causal inference that “the comparison must be a comparison of \(Y_i(1)\) and \(Y_i(0)\) for a common set of units” (Rubin 2005, 323). If we aspire taking expectations over the *same* set of units, we will need to make additional assumptions. Depending on such assumptions about the treatment assignment mechanism and about \(\rho\), there will be circumstances where \(\tau_i\) and \(\tilde\tau_i\) lead to the same estimates, but without such assumptions, the estimands \(\tau_i\) and \(\tilde\tau_i\) are generally different.

I realize that the methods presented in this chapter only scratch the surface of a tremendous, yet unexplored field. The methodology is in a nascent state, and I hope that the materials in this chapter will stimulate further research in the area.