Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

broom-compatible form & content of balanceTest output #90

Open
benthestatistician opened this issue Dec 6, 2017 · 9 comments
Open

broom-compatible form & content of balanceTest output #90

benthestatistician opened this issue Dec 6, 2017 · 9 comments

Comments

@benthestatistician
Copy link
Collaborator

benthestatistician commented Dec 6, 2017

Via the broom package, the "tidyverse" movement is pushing towards model outputs that can be summed up in terms of vectors of about the lengths of model coefficient vectors (broom::tidy()), of vectors of about the length of fitted values or residuals (broom::augment()); and of whole-model summaries analogous to R^2 (broom::glance()).

Let's have tidy and glance method for xbal objects (and/or a successor class). Let's also arrange things so that they and print.xbal() share code, e.g. by having print.xbal() internally call on tidy.xbal() and glance.xbal for some or all of the formatting embedded in the "results" and "overall" portions of the display.

(In particular I've recently started to code into print.xbal special formatting rules that format by row to some extent and by column elsewhere. I think tidy.xbal() should emulate the unusual by-row formatting, for consistency; we might as well use the same routines in both places.)

@benthestatistician
Copy link
Collaborator Author

Without having thought this through, I suspect that we could organize things in such a way that a broom::augment method could be used to furnish the essence of related but thematically separate calculations. (E.g., null covariance matrix of imbalance statistics.)

benthestatistician added a commit that referenced this issue Jun 26, 2018
Starting a new branch for work on #90, in anticipation of
commits to come that may potentially be somewhat disruptive.
@benthestatistician
Copy link
Collaborator Author

benthestatistician commented Jun 27, 2018

Notes to self:

  1. We may be able to deploy these methods inside of print.xbal().

  2. Not strictly on-issue, but related: [master f4408b7] somewhat misplaced the material to handle rounding of orig_units_columns values. It should happen prior to the block bearing the condition

    if (show.signif.stars && !show.pvals && !is.null(theresults) && hasP )
    

so that the output gets modified regardless of whether that condition holds. For an example of the problem, see tests added in [i90-broom 1f71fc0] (one of which currently fails). Fixing will call for a bit of compensatory surgery.

@benthestatistician benthestatistician changed the title broom-compatible form & content of balanceTest output? broom-compatible form & content of balanceTest output Jun 27, 2018
@benthestatistician
Copy link
Collaborator Author

Follow-ups:

  1. Re reuse of code in tidy.xbal and in print.xbal, [RItools:::original_units_var_formatter()] does this
  2. I took care of the misplaced orig_units_columns material in [i90-broom 988a5d5]

Remaining to-dos:

  • Add a strata column to tidy.xbal output
  • Ditto for glance.xbal output. While at this remove row names from this output, adjust tests

benthestatistician added a commit that referenced this issue Jun 28, 2018
Fixes a problem from f4408b7 . See also  1f71fc0 , comments to #90.
(While I was at it I adjusted internal variable names, maybe making
it a bit easier to read.)
@benthestatistician
Copy link
Collaborator Author

If given non-null varnames_crosswalk parameter, tidy.xbal() moves those variable names to the end of the data frame. Would be better to leave in place. Example:

<stuff> %>%                                 
 + tidy.xbal(format=T, digits=3)                                                       
                      vars Control Treatment adj.diff std.diff pooled.sd
 1                grdavg13    99.7      77.7    -22.0  -0.4845      45.5
 2                pctwht13   0.472     0.450  -0.0223  -0.0649     0.343
                   NA.info statistic  p.value
 1                            -2.200 3.06e-01
 2              (pctwht13)    -0.578 1.00e+00

as compared to

<stuff> %>%   
tidy.xbal(varnames_crosswalk=c(Treatment="recruited", Control="other US elem", adj.\
 diff="recruited - other"),format=T, digits=3)                                         
                      vars std.diff pooled.sd      z        p
 1                grdavg13  -0.4845      45.5 -2.200 3.06e-01
 2                pctwht13  -0.0649     0.343 -0.578 1.00e+00
 3             pctFRPlun13   0.8853     0.226  6.024 2.21e-08
                   NA.info recruited other US elem recruited - other
 1                              77.7          99.7             -22.0
 2              (pctwht13)     0.450         0.472           -0.0223
 3           (pctFRPlun13)     0.758         0.558             0.200

@benthestatistician
Copy link
Collaborator Author

benthestatistician commented Sep 19, 2018

More to-dos (not necessary for closing out this issue):

  • balanceTest()'s return objects to have their own type, distinct from those of xBalance();
  • while we're at it the originating call should be a part of that type of object, so that
  • update.default() can be invoked on objects of this type.

(the last point is distinct from this issue and may eventually occasion its own.)

@benthestatistician
Copy link
Collaborator Author

(Probably merits spinning out as own issue) Resources permitting it would be nice to improve on how Date variables get presented.

vars Control Treatment adj.diff z
MALE 0.512 0.510 -0.001 0.055
DOB 13,891.887 13,879.989 -11.898 -1.216
ECON_DISADV 0.661 0.727 0.066 -0.616

Recording some notes that may be helpful to that end.

  1. The Treatment and Control entries can be typed as Date, then passed through format(). (But see below.)
  2. The difference of those two Dates can be typed as difftime(), perhaps also passed through format
  3. The difference of these two Dates can also be passed to units(), giving answers ranging from "secs" to "days" (but apparently not "months" or "years", disappointingly). The result of this can in turn by used to set the units= argument of round.Date(). It seems sensible to do this prior to formatting the Treatment and Control entries.

@markmfredrickson
Copy link
Owner

On Dates, one thing that is a little tricky is that tidy wants to produce a data.frame. The columns of a data.frame are all the same type, it is non-trivial to get difference presentation of specific rows.

Some options,

  1. Make all the columns character type. Then we can format as we see fit. The downside here is that if you want to use the results programatically, you would need to parse it back to another type.
  2. Flip the table so that variables are rows. This could work for the group means and differences, but we're probably back to square one with z-scores and p-values.

@benthestatistician
Copy link
Collaborator Author

(Continuing my last comment above, in which I complained about units() applied to a difftime object not returning a unit larger than days.)

It turns out that going past weeks requires grappling with some additional complications. The lubridate package does this, and perhaps we could tap into their solution. It would seem to require typing and formatting our "adj.diff" columns as "duration".

@benthestatistician
Copy link
Collaborator Author

This could be wrapped up even while leaving the questions about Date variables for another day.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants