2.9 Exercises

Exercise 2.1 (Nomogram) Construct a graphic representation of Equation (2.27) that allows the user to convert \(\lambda\) and \(\gamma\) for different values of \(\nu\). What influence does \(\nu\) have on the relation between \(\lambda\) and \(\gamma\)?

Exercise 2.2 (Models) Explain the difference between the response model and the imputation model.

Exercise 2.3 (Listwise deletion) In the airquality data, predict Ozone from Wind and Temp. Now randomly delete the half of the wind data above 10 mph, and randomly delete half of the temperature data above 80\(^\circ\)F.

Are the data MCAR, MAR or MNAR?
Refit the model under listwise deletion. Do you notice a change in the estimates? What happens to the standard errors?
Would you conclude that listwise deletion provides valid results here?
If you add a quadratic term to the model, would that alter your conclusion?

Exercise 2.4 (Number of imputations) Consider the nhanes dataset in mice.

Use the functions ccn() to calculate the number of complete cases. What percentage of the cases is incomplete?
Impute the data with mice using the defaults with seed=1, predict bmi from age, hyp and chl by the normal linear regression model, and pool the results. What are the proportions of variance due to the missing data for each parameter? Which parameters appear to be most affected by the nonresponse?
Repeat the analysis for seed=2 and seed=3. Do the conclusions remain the same?
Repeat the analysis with \(m=50\) with the same seeds. Would you prefer this analysis over those with \(m=5\)? Explain why.

Exercise 2.5 (Number of imputations (continued)) Continue with the data from the previous exercise.

Write an R function that automates the calculations of the previous exercise. Let seed run from 1 to 100 and let m take on values m = c(3, 5, 10, 20, 30, 40, 50, 100, 200).
Plot the estimated proportions of explained variance due to missing data for the age-parameter against \(m\). Based on this graph, how many imputations would you advise?
Check White’s conditions 1 and 2 (cf. Section 2.8). For which \(m\) do these conditions true?
Does this also hold for categorical data? Use the nhanes2 to study this.

Exercise 2.6 (Automated choice of \(m\)) Write an R function that implements the methods discussed in Section 2.8.