-
Notifications
You must be signed in to change notification settings - Fork 14
What to do about "dimnames of 'x' contains duplicates" errors
The fullmatch
and pairmatch
functions use the dimnames of the distance matrix or other distance object in order to produce a named factor that records which units in are matched to each other. If these names are not present or in some way damaged, the user will see the following error:
Error in fullmatch.matrix(distanceMatrix, data = originalData) :
dimnames of argument 'x' contain duplicates
Here x
refers to the distanceMatrix
object the user created.
There are two ways this error can be generated, either there are 1) duplicates in the names of the original data or distance matrix or 2) there are missing entries for the treatment indicator variable. In the following examples, we'll demonstrate both issues and provide solutions.
The following preamble sets up the R environment with the basic propensity score matching problem:
set.seed(201406019)
library(optmatch)
data(nuclearplants)
pmod <- glm(pr ~ ., data = nuclearplants, family = binomial)
scores <- predict(pmod)
n <- length(scores)
names(scores) <- make.unique(sample(LETTERS, n , replace = T))
If there are duplicates in the names of treated or control units, fullmatch
and pairmatch
cannot return a named factor that uniquely determines which unit matches which. If unit A
appears twice, in the data, and the named factor states that A
is a member of both group 1 and group 2, how does the user know which of the two A
s in the which of the two groups?
For example, if we accidently duplicated a name in scores
:
badscores <- scores
names(badscores)[1] <- names(badscores)[2]
d.bs <- match_on(badscores, z = nuclearplants$pr)
x <- fullmatch(d.bs, data = badscores)
Error in fullmatch.matrix(d.bs, data = badscores) :
dimnames of argument 'x' contain duplicates
This can be fixed by making the names unique:
names(badscores) <- make.unique(names(badscores))
When the variable indicating if a unit received treatment or control has a missing value (NA
), this message can also appear. This tends to happen when distance matrices are created using the outer
function. Using Optmatch's built-in match_on
function will often catch this problem earlier in the process with a more suitable error message.
Let's see an example of the error:
badz <- nuclearplants$pr
badz[1] <- NA
d.bz <- abs(outer(scores[badz == 1], scores[badz == 0], `-`))
x <- fullmatch(d.bz, data = prank)
Error in fullmatch.matrix(d.bz, data = prank) :
dimnames of argument 'x' contain duplicates
This error would have been caught earlier if the match_on
function was used to create the distance matrix instead of outer
(when there is no problem with the treatment indicator, these two methods return the same result):
d.bz <- match_on(scores, z = badz)
Error in toZ(z) : NAs not allowed in treatment indicator.
For either approach, the solution is only build distances on the subset of the data that have valid treatment indicators:
goodz <- badz[!is.na(badz)]
scores2 <- scores[!is.na(badz)]
d.gz <- match_on(scores2, z = goodz)
x <- fullmatch(d.gz, data = scores2)