-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remove idle "(element weights)" entries in xbal objects #85
Comments
Probably the answer is that they shouldn't be presented there. However, I recall leaving them in because I thought there might be some sort of division-by-zero types edge cases where they'd show as something other than 1.00, and we'd want to keep the user informed. I suspect that now I could think of a better handlings for those edge cases -- indeed, perhaps there are connections to @markmfredrickson's and my work on subgroups and balance. To do:
(My leftover cruft in the xbal objects was an impediment to @josherrickson's start on the ggbalance branch, disussed in #84 comments.) |
Here's a clue, I think, about what "(element weights)" should in some way be reflecting on:
(from my commit notes to [master 2672306]). |
What I think I need to do is
|
A commit I'm planning to push up in a day or two ("Changed meaning of 1st col NotMissing slots, NM.* slots"; should get a mention in comment thread to this issue) changes both the name and the meaning of this entry:
Purpose: suppose an RCT with some units lost to follow up before their covariates could be collected. These units were a part of the design, so we don't want them deleted before our calculations; if they had been deleted in this way then our inferences of first and second order assignment probabilities would be thrown off. But we should also be tracking imbalance on this form of data missingness. Thus, their inclusion. Does this reasoning justify presenting the user with "
(Additional comment: After working through test failures associated with this change, I see that I had managed to mislead myself about the purpose of " |
I like those action steps and the instinct to be clear about treatment of
missingness.
…On Tue, Dec 5, 2017 at 9:20 AM, Ben ***@***.***> wrote:
A commit I'm planning to push up in a day or two ("Changed meaning of 1st
col NotMissing slots, NM.* slots"; should get a mention in comment thread
to this issue) changes both the name and the meaning of this entry:
- "Intercept" |--> "_non-null record_"
- The column gets a TRUE iff record in question has a non-NA entry in
at least one field of Covariates.
Purpose: suppose an RCT with some units lost to follow up before their
covariates could be collected. These units were a part of the design, so we
don't want them deleted before our calculations; if they had been deleted
in this way then our inferences of first and second order assignment
probabilities would be thrown off. But we should also be tracking imbalance
on this form of data missingness. Thus, their inclusion.
Does this reasoning justify presenting the user with "(_non-null record_)"
entries in tables and displays, even when there's no variation in this
computed variable? I had been thinking that this was just one step along
the way to cleanup of those displays, with the next step being to excise it
from the displays in those cases. But maybe there's a benefit to reminding
the user that balanceTest and similar functions aren't afraid of seeing
null records, and have a use for them. In addition, if it's always there in
the outputs of balanceTest() and similar functions, that simplifies
programming with those functions. If we have print and plot methods for
those objects, they might be set up to have an option to take these entries
out, and/or we might facilitate that by always presenting them last. Action
steps:
- Get some experience looking at tables with "(_non-null record_)"
entries
- If those entries don't seem too weird, close this issue out w/o the
removal of "(Intercept)"/"(_non-null record_)" entries I called for in
comments earlier in this thread.
(Additional comment: After working through test failures associated with
this change, I see that I had managed to mislead myself about the purpose
of "elements weights"/"Intercept", through ambiguous inline comments.
That column wasn't there to address any specific statistical edge case, it
was there to ensure that the NotMissing slot would always have at least one
column, even when there was no missingness, so that the transformation of
it in aggregateDesigns and ops using it in alignDesignsByStrata,
alignedToInferentials could work in a uniform way whether there was
missingness or not. The same comment referenced above touches up those
inline comments a little bit; hopefully with this column also gaining some
meaning and purpose at the same time, similar ambiguities won't come back
to bite us subsequently. )
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#85 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAiyTFbh80lTJwJW7minzFSFrW6RAktZks5s9V8ngaJpZM4N-oKd>
.
|
Commit ae0d5a7 removes these entries when they're idle, ie when Treatment and Control means on But in the process of this work I noticed a problem with the edge case of just 1 covariate that I don't presently have time to address; instead recording it here. In n <- 20
df <- data.frame(Z = rep(c(0,1), n/2),
X1 = rnorm(n),
X2 = rnorm(n))
df$X1[1:3] <- NA
bt1 <- balanceTest(Z ~ X1, data = df)
> bt1
strata Unstrat
stat std.diff z
vars
X1 0.13 0.27
(_non-null record_) NaN NA The bt2 <- balanceTest(Z ~ X1 + X2, data = df)
> bt2
strata Unstrat
stat std.diff z
vars
X1 0.13 0.27
X2 1.00 2.03
(X1) 0.27 0.61
(_non-null record_) NaN NA The fact that there is probably a by-product of X1 being the only variable, but I haven't looked into it in any detail. Commit ae0d5a7 includes a commented-out test that we can enable once this matter is addressed: expect_s3_class(bt1, "xbal")
## expect_true("(X1)" %in% dimnames(bt1[["results"]])[["vars"]])
expect_false("(_non-null record_)" %in% dimnames(bt1[["results"]])[["vars"]])
Another to-do: give an example of a setup with " |
I'm hopeful that this takes care of the main issue, but it does leave at least 2 loose ends; see commented-out line of test.balanceTest.R, comments to #85.
Preparing to push up some relevant commits on the i90-broom branch:
In the examples given in that last comment we're now seeing: > bt1
strata(): --
stat std.diff z
vars
X1 0.13 0.27
(_any Xs recorded_) 0.27 NA
> bt2
strata(): --
stat std.diff z
vars
X1 0.13 0.27
X2 1.00 2.03
(X1) 0.27 0.61 (The fact that there's no " |
Never got around to an example w/ " |
The name "
Notice how much horizontal space the variables column consumes, all b/c of this long label. (Additionally in some contexts the underscores mean it renders in italics. Not sure whether this is bug or feature.) For a replacement how about " |
Note to self:
Except that presently, this isn't such an example, although it should be. Both treatment and control groups' means on dat = matrix(c( 0,1,NA, 0,NA,NA, 1,1,2 ),
3,3,
byrow=T,
dimnames=list(1:3, c("z", "x1", "x2"))
)
dat <- as.data.frame(dat)
balanceTest(z~x1+x2, dat=dat, report="all")[['results']]
, , strata = --
stat
vars Control Treatment std.diff adj.diff pooled.sd z p
x1 1 1 NaN 0 0 0.707 0.959
x2 NaN 2 NaN NaN NaN 0.707 0.959
(x2) 0 1 Inf 1 0 1.414 0.472
attr(,"originals")
[1] 1 2 0
attr(,"term.labels")
[1] "x1" "x2"
attr(,"include.NA.flags")
[1] TRUE Dropping inside the balT call in order to pull out the adjusted difference of totals generates a more specific indication of what should have been shown: cbind(descriptives[,c(1,2,4),1, drop=T], adj.diff.of.totals=tmp$`--`$adj.diff.of.totals)
Control Treatment adj.diff adj.diff.of.totals
x1 1 1 0 0.5
x2 NaN 2 NaN 1.0
(x2) 0 1 1 1.0
(_non-null record_) 1 1 0 0.5 |
Element weights are used to normalize the figures that get compared in
balanceTest
calcs. By definition, they'll ordinarily show up as "1.00", raising the question of why they're being presented in tables and plots.Reopening b/c the solution may have gone too far. See comments below. Descriptives calculation for
(_non-null record_)
row seems to need revisiting.The text was updated successfully, but these errors were encountered: