11.4 Conclusion

This chapter described techniques for imputing longitudinal data in both the wide and long formats. Some things are easier in the wide format, e.g., change scores or imputing data, while other procedures are easier in the long format, e.g., graphics and advanced statistical modeling. It is therefore useful to have both formats available.

The methodology for imputing data in the wide format is not really different from that of cross-sectional data. When possible, always try to convert the data into the wide format before imputation. If the data have been observed at irregular time points, as in the Terneuzen Birth Cohort, conversion of the data into the wide format is not possible, however, and imputation can be done in the long format by multilevel imputation.

This chapter introduced time raster imputation, a technique for converting data with an irregular age spacing into the wide format by means of imputation. Time rastering seems to work well in the sense that the generated trajectories follow the individual trajectories. The technique is still experimental and may need further refinement before it can be used routinely.

The current method inserts missing data at the full time grid, and thus imputes data even at time points where there are real observations. One obvious improvement would be to strip such points from the grid so that they are not imputed. For example, in the Terneuzen Birth Cohort this means that we would always take observed birth weight when it is measured.

Another potential improvement is to use the OLS estimates within each cluster as the center of the posterior predictive distribution rather than their shrunken versions. This would decrease within cluster variability in the imputations, and increase between cluster variability. It is not yet clear how to deal with clusters with only a few time points, but this modification is likely to produce age-to-age correlations that are most faithful to the data.

Finally, the selection of the data could be much stricter. The analysis of the Terneuzen Birth Cohort data used a very liberal inclusion criterion that requires a minimum of only three data points across the entire age range. Sparse trajectories will have large imputation variances, and may thus bias the age-to-age correlations toward zero. As a preliminary rule of thumb, there should be at least one, and preferably two or more, measurements per period.