This document is based in section 7.4 of the book ‘Flexible Imputation of Missing Data’ by Stef van Buuren.

This practical needs the mice library:

library(mice)

## Item YA

Are you able to walk outdoors on flat ground?

1. Without any difficulty
2. With some difficulty
3. With much difficulty
4. Unable to do

## Item YB

Can you, fully independently, walk outdoors (if necessary with a cane)?

1. Yes, no difficulty
2. Yes, with some difficulty
3. Yes, with much difficulty
4. No, only with help from others

## Equating categories

We have two studies, A and B. YA has been measured in Study A, and YB has been measured in Study B.

Would it be a good idea just to equate the four categories?

The equating assumption implicitly assumes that only combinations (0, 0), (1, 1), (2, 2) and (3, 3) can occur. Is that realistic?

## Imputation under independence

Let YA be the item of Study A, and let YB be the item of Study B. The comparability problem is a missing data problem, where YA is missing for population B and YB is missing for population A. This formulation may help in using multiple imputation to solve the problem.

First, we create a small dataset with responses as follows:

fA <- c(242, 43, 15, 0, 6)         # frequencies of population A
fB <- c(145, 110, 29, 8)           # frequencies of population B
YA <- rep(ordered(c(0:3, NA)), fA) # outcome item A population A
YB <- rep(ordered(c(0:3)), fB)     # outcome item B population B

Combine both datasets with missing values for item YB for population A, and missing values for item YA for population B. The dataframe Y contains 604 rows and 2 columns: YA and YB.

Y <- rbind(data.frame(YA, YB = ordered(NA)),
data.frame(YB, YA = ordered(NA)))
dim(Y)
## [1] 598   2
head(Y)
##   YA   YB
## 1  0 <NA>
## 2  0 <NA>
## 3  0 <NA>
## 4  0 <NA>
## 5  0 <NA>
## 6  0 <NA>
tail(Y)
##       YA YB
## 593 <NA>  3
## 594 <NA>  3
## 595 <NA>  3
## 596 <NA>  3
## 597 <NA>  3
## 598 <NA>  3
md.pattern(Y)
##      YA  YB
## 292   0   1   1
## 300   1   0   1
##   6   0   0   2
##     298 306 604

There no observations that link YA to YB, and so the missing data pattern is unconnected. Moreover, there are 6 records that contain no item data at all.

The following chunk is a bit of specialty code that defines two functions. The function micemill() calculates Kendall’s $$\tau$$ (rank order correlation) between the imputed versions of YA and YB at each iteration. The function ra is a small helper function that puts the imputed data in proper shape.

micemill <- function(n){
for (i in 1:n){
imp <<- mice.mids(imp)
cors <- with(imp, cor(as.numeric(YA),
as.numeric(YB), method = 'kendall'))
tau <<- rbind(tau, ra(cors, s =T))
}
}
ra <- function(x, simplify = FALSE) {
if (!is.mira(x)) return(NULL)
ra <- x$analyses if (simplify) ra <- unlist(ra) return(ra) } The following code imputes the missing data in Y under the (dubious) assumption that YA and YB are mutually independent. tau <- NULL imp <- mice(Y, max = 0, m = 10, print = FALSE, seed = 32662) micemill(25) # define a function to plot tracelines of Kendall's tau plotit <- function() matplot(x = 1:nrow(tau), y = tau, ylab = expression(paste("Kendall's ", tau)), xlab = "Iteration", type = "l", lwd = 1, lty = 1:10, col = "black") plotit() In the plot 25 iterations are plotted: the trace start near zero, but then freely wander off over a substantial range of the correlation. The MICE algorithm does not know where to go, and wander pointlessly through the parameter space. This occurs because the data contains no information that informs the relation between YA and YB, so $$\tau$$ can be anything. ## Why we cannot simply equate categories Suppose that we have a third, external study E in which both YA and YB are measured. ## 0 1 2 3 ## 0 128 45 3 2 178 ## 1 13 45 10 0 68 ## 2 3 20 14 5 42 ## 3 0 0 1 1 2 ## NA 1 0 1 0 2 ## 145 110 29 8 292 The contingency table shows that there is a strong relation between YA and YB. However, it is far from perfect, so simply equating the four categories between YA and YB will distort their relationship. Note that the table is not symmetric, indicating that YA is more difficult than YB. Simple equating assumes 100% concordance of the pairs. The contingency table clearly shows that this is not the case in study E. On surface, the four response categories of YA and YB may look similar, but the information from sample E suggests that the items work differently in a systematic way. ## Imputation using a bridge study Is there be a way to incorporate the relationship between YA and YB so that they will become comparable? The answer is yes. We can redo the imputation, but now with sample E added to the data. In this way study E acts as a bridge study. The relevant data are built-in in the mice under the name of walking. head(walking) ## sex age YA YB src ## 1 Male 61 1 <NA> A ## 2 Female 69 1 <NA> A ## 3 Male 74 0 <NA> A ## 4 Male 66 0 <NA> A ## 5 Female 72 2 <NA> A ## 6 Male 67 0 <NA> A table(walking$src)
##
##   A   B   E
## 306 292 292
with(walking, table(YA, YB, src, useNA = "always"))
## , , src = A
##
##       YB
## YA       0   1   2   3 <NA>
##   0      0   0   0   0  242
##   1      0   0   0   0   43
##   2      0   0   0   0   15
##   3      0   0   0   0    0
##   <NA>   0   0   0   0    6
##
## , , src = B
##
##       YB
## YA       0   1   2   3 <NA>
##   0      0   0   0   0    0
##   1      0   0   0   0    0
##   2      0   0   0   0    0
##   3      0   0   0   0    0
##   <NA> 145 110  29   8    0
##
## , , src = E
##
##       YB
## YA       0   1   2   3 <NA>
##   0    128  45   3   2    0
##   1     13  45  10   0    0
##   2      3  20  14   5    0
##   3      0   0   1   1    0
##   <NA>   1   0   1   0    0
##
## , , src = NA
##
##       YB
## YA       0   1   2   3 <NA>
##   0      0   0   0   0    0
##   1      0   0   0   0    0
##   2      0   0   0   0    0
##   3      0   0   0   0    0
##   <NA>   0   0   0   0    0

The missing data pattern of the combined dataset of populations A, B and E:

md.pattern(walking)
##     sex age src  YA  YB
## 290   1   1   1   1   1   0
## 294   1   1   1   0   1   1
## 300   1   1   1   1   0   1
##   6   1   1   1   0   0   2
##       0   0   0 300 306 606

Now, for 290 subjects we have scores on both YA and YB (from bridge study E).

Multiple imputation on the dataset walking can now be done as

tau <- NULL
imp <- mice(walking, max = 0, m = 10, seed = 92786)
pred <- imp\$pred
pred[, c("src", "age", "sex")] <- 0
imp <- mice(walking, max = 0, m = 10, seed = 92786, pred = pred)
micemill(20)
plotit()