11.5 Exercises

Exercise 11.1 (Potthoff-Roy, wide format imputation) Potthoff and Roy (1964) published classic data on a study in 16 boys and 11 girls, who at ages 8, 10, 12, and 14 had the distance (mm) from the center of the pituitary gland to the pteryomaxillary fissure measured. Changes in pituitary-pteryomaxillary distances during growth is important in orthodontic therapy. The goals of the study were to describe the distance in boys and girls as simple functions of age, and then to compare the functions for boys and girls. The data have been reanalyzed by many authors including Jennrich and Schluchter (1986), Little and Rubin (1987), Pinheiro and Bates (2000), Verbeke and Molenberghs (2000) and Molenberghs and Kenward (2007).

Take the version from Little and Rubin (1987) in which nine entries have been made missing. The missing data have been created such that children with a low value at age 8 are more likely to have a missing value at age 10. Use mice() to impute the missing entries under the normal model using \(m\) = 100.
For each missing entry, summarize the distribution of the 100 imputations. Determine the interquartile range of each distribution. If the imputations fit the data, how many of the original values you expect to fall within this range? How many actually do?
Produce a lattice graph of the nine imputed trajectories that clearly shows the range of the imputed values.

Exercise 11.2 (Potthoff-Roy, comparison) Use the multiply imputed data from the previous exercise, and apply a linear mixed effects model with an unstructured mean and an unstructured covariance. See Molenberghs and Kenward (2007 ch. 5) for a discussion of the setup. Discuss advantages and disadvantages of the analysis of the multiply imputed data compared to direct likelihood.

Exercise 11.3 (Potthoff-Roy, long format imputation) Do this exercise with the complete Potthoff-Roy data. Warning: This exercise requires good data handling skills and some patience.

Calculate the broken stick estimates for each child using 8, 10, 12 and 14 as the break ages. Make a graph like Figure 11.6. Each data point has exactly one parameter, so the fit could be perfect in principle. Why doesn’t that happen? Which two children show the largest discrepancies between the data and the model?
Compare the age-to-age correlation matrix of the broken stick estimates to the original data. Why are these correlation matrices different?
How would you adapt the analysis such that the age-to-age correlation matrix of the broken stick estimates would reproduce the age-to-age correlation matrix of the original data. Hint: Think of a simpler form of multilevel analysis.
Multiply impute the data according to the method used in Section 11.3.6, and produce a display like Figure 11.7 for children 1, 7, 20, 21, 22 and 24.
Compare the age-to-age correlation matrix from the imputed data to that of the original data. Are these different? How? Calculate the correlation matrix after deleting the data from the two children who showed the largest discrepancy in the broken stick model. Did this help?
How would you adapt the imputation method for the longitudinal data so that its correlation matrix is close to that of the original?