-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
switch plot.xbal to ggplot2 #84
Comments
I would support this. I always wanted to like lattice graphics, but could never really get them to do what I wanted so I went with base on the last rewrite. As I recall, I think the big reason was I wanted to group dummies from factor variables. If we can do that, with ggplot, then I'm all for it. |
ggplot, by default, is sorting the y-axis alphabetically, so as long as you don't have oddity like a variable named "XYc" and a factor variable named "XY with levels "a", "b" and "d", it will group all factors together. |
I would also support this as long as the factor grouping could happen nicely. I agree with wanting to like but not really liking lattice graphics. I might advocate |
@markmfredrickson @rstudley FYI I think we're going to need to address this issue in the process of RCT work on EE. (Everybody else, sorry for being cryptic.) |
@benthestatistician To un-cryptic that a bit, are you implying this will be handled elsewhere and we can close this issue? Or that it would be beneficial to move ahead with this idea? |
Ben, I’m not sure why you included me here. Do you need my input?
|
@josherrickson I wasn't intending to imply anything one way or another about where the issue will be handled, rather to inform @markmfredrickson and @rstudley that the work being discussed under this thread connects to another stream of upcoming work that they (and I) will need to plan for. I just (cryptically) linked this issue to another one, leaving open the question of who should grab this issue and run with it. The two issues are
In support of my yes to (1), I think that switching over to ggplot2 will make the plotting function easier to maintain. I defer to Jake and others on aesthetics, so if he's happy (after we accept his @josherrickson would you be willing to start us off with a new branch implementing a ggplot2-based When Mark turns his attention back toward EE implementation, with @rstudley's blessing might ask him to contribute a little (<=5hrs) to whatever may remain. Even if these contributions won't necessarily get incorporated into EE immediately, I think the potential savings in coder time for the EE are sufficiently large that having Mark devote a bit of his time on that project to this effort is likely to save money for the EE project, on balance. (After an hour or two of working on the new implementation, he may well find that with a couple more hours spent adapting this work to EE he could spare himself another 10 hours that would have otherwise had to be spent.) |
Sounds like you want me to agree to up to five hours of Mark’s time to invest in a potential graphical improvement for EE. I agree, on the very mild conditions that:
1. Mark agrees that this is more likely than not to benefit EE, and
2. Mark’s priority remains dealing with any issues that arise due to the EE hosting migration.
|
Threw together a quick function here: 387950c.
Notes:
You can even override parts which were defined in the function.
|
👍 🥇 Thanks @josherrickson ! I don't think the dependencies should be a problem. (Even in the EE, Mark's done some nice work to make it easier to keep an R installation current.) |
Would there be a reasonably easy way to represent groupings of variables on these plots, by adding
It would be something along lines of what you see here: (Ignore "Outcome sensitivity" stuff at the bottom, at least for now.) (Aside: current plans for #85 call for renaming "(element weights)" and removing it most of the time but not all of the time.) |
I can think of two ways to do that.
I think 2 is more likely to be an easier implementation, but I'd be hesitant to do it - anyway I can think of actually implementing it would be fragile. |
I rethought 2., and think it might be feasible. I added some preliminary work to the branch in f2caf76. What remains is associating each "variable" (in quotes because the variables now include the group labels) with a y-value. Perhaps something like adding additional spacing above each group label? Edit: The reason I rethought it was that ggplot is less fickle about plotting region and margins than base R - it should handle things properly in general. |
@josherrickson in an offline conversation you mentioned that tests were failing on the master branch in a way that was blocking your progress. When I run the tests on the master branch, the only failure I'm seeing is
more specifically, here:
Can you confirm that this is the trouble you have in mind? (It happens that I don't have an RSVGTips on this computer, so it's skipping those tests, with a warning. If you're seeing issues related to RSVGTips, I'd encourage you to set them aside if possible.) |
I get that error, an additional RSVGTips error, and several warnings. The warnings may be specific to my system.
|
Exciting progress!
If you give it a data set w/ partial missingness on, say, "var", then
`balanceTest()` should return an array w/ a row "(var)", for missingness on
"var".
1. Does it?
2. Can we make sure that that row appears adjacent to the "(_non-null
record_)" row?
3. I had imagined all of these missingness rows being at the bottom. But I
can also envision an argument for collecting them at the top. Maybe with
an example of what it looks like with multiple non-missingness rows we'll
be in a good position to decide which of these alternatives are suitable.
|
@benthestatistician There is a blocking issue right now related to missing data in |
Here is an updated version of the plot now that issue #92 was resolved. I fixed a small bug that was causing the missing indicator to appear in a different variable group. As you can see the (X1) column appears along with X1. Comments? |
Interesting. As noted above my thinking had been to group all the missingness indicator vars together amongst themselves, perhaps as their own variable group. But perhaps we should discuss and consider a bit before going down that route. Here are my thoughts/reactions about the current rendering scheme:
|
Right now
plot.xbal
has a lot of issues (see eg #82 #80 and another unreported issue that vertical margins look wonky). Have we considered moving to ggplot2 instead of base R plotting? It would address a lot of the concerns immediately. We can get 90% of the way there very quickly:The text was updated successfully, but these errors were encountered: