5.2 Parameter pooling
5.2.1 Scalar inference of normal quantities
Section 2.4 describes Rubin’s rules for pooling the results from the \(m\) complete-data analyses. These rules are based on the assumption that the parameter estimates \(\hat Q\) are normally distributed around the population value \(Q\) with a variance of \(U\). Many types of estimates are approximately normally distributed, e.g., means, standard deviations, regression coefficients, proportions and linear predictors. Rubin’s pooling rules can be applied directly to such quantities (Schafer 1997; Marshall, Billingham, and Bryan 2009).
5.2.2 Scalar inference of non-normal quantities
How should we combine quantities with non-normal distributions: correlation coefficients, odds ratios, relative risks, hazard ratios, measures of explained variance and so on? The quality of the pooled estimate and the confidence intervals can be improved when pooling is done in a scale for which the distribution is close to normal. Thus, transformation toward normality and back-transformation into the original scale improves statistical inference.
As an example, consider transforming a correlation coefficient \(\rho_\ell\) for \(\ell=1,\dots,m\) toward normality using the Fisher \(z\) transformation
\[ z_\ell = \frac{1}{2}\ln{\frac{1+\rho_\ell}{1-\rho_\ell}}\tag{5.1} \]
For large samples, the distribution of \(z_\ell\) is normal with variance \(\sigma^2 = 1/(n-3)\). It is straightforward to calculate the pooled correlation \(\bar z\) and its variance by Rubin’s rules. The result can be back-transformed by the inverse Fisher transformation
\[ \bar \rho = \frac{e^{2\bar z}-1}{e^{2\bar z}+1}\tag{5.2} \]
The confidence interval of \(\bar \rho\) is calculated in the \(z\)-scale as usual, and then back-transformed by Equation (5.2).
Statistic | Transformation | Source |
---|---|---|
Correlation | Fisher \(z\) | Schafer (1997) |
Odds ratio | Logarithm | Agresti (1990) |
Relative risk | Logarithm | Agresti (1990) |
Hazard ratio | Logarithm | Marshall, Billingham, and Bryan (2009) |
Explained variance \(R^2\) | Fisher \(z\) on root | Harel (2009) |
Survival probabilities | Complementary log-log | Marshall, Billingham, and Bryan (2009) |
Survival distribution | Logarithm | Marshall, Billingham, and Bryan (2009) |
Table 5.2 suggests transformations toward approximate normality for various types of statistics. There are quantities for which the distribution is complex or unknown. Examples include the Cramér \(C\) statistic (Brand 1999) and the discrimination index (Marshall, Billingham, and Bryan 2009). Ideally, the entire sampling distribution should be pooled in such cases, but the corresponding pooling methods have yet to be developed. The current advice is to search for ad hoc transformations to make the sampling distribution close to normality, and then apply Rubin’s rules.