Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multivariate EMM results from aov_ez #45

Open
rvlenth opened this issue Feb 19, 2018 · 15 comments
Open

Multivariate EMM results from aov_ez #45

rvlenth opened this issue Feb 19, 2018 · 15 comments

Comments

@rvlenth
Copy link
Contributor

rvlenth commented Feb 19, 2018

The help page for aov_ez states:

A caveat regarding the use of emmeans concerns the assumption of sphericity for ANOVAs including within-subjects/repeated-measures factors (with more than two levels). While the ANOVA tables per default report results using the Greenhousse-Geisser correction, no such correction is available when using emmeans. This may result in anti-conservative tests.

I think this is somewhat misleading, in that it is true only because the implementation of emm_basis.aov_ez relies on the aov results. It appears that aov_ez also fits a multivariate model, which is returned in the lm member of the object. If that were actually used fully, users could obtain EMMs based on the multivariate model. Those results would not have the deficiencies mentioned above.

To illustrate, here is an example from the help page for aov_ez:

data(md_12.1)
aez <- aov_ez("id", "rt", md_12.1, within = c("angle", "noise"), 
       anova_table=list(correction = "none", es = "none"))

Here is the reference grid for the lm member of the object:

ref_grid(aez$lm)
## 'emmGrid' object with variables:
##     1 = 1
##     rep.meas = multivariate response levels: X0_absent, X0_present, X4_absent, X4_present, X8_absent, X8_present

So lets use the mult.names option to sort these out:

aez.mult <- ref_grid(aez$lm, 
    mult.levs = list(noise = c("absent", "present"), 
                     angle = c("X0", "X4", "X8")))

Now compare the results:

emmeans(aez.mult, ~ noise * angle)
## noise   angle emmean       SE df lower.CL upper.CL
## absent  X0       462 18.00000  9 421.2812 502.7188
## present X0       492 28.00000  9 428.6596 555.3404
## absent  X4       510 27.20294  9 448.4627 571.5373
## present X4       660 34.64102  9 581.6366 738.3634
## absent  X8       528 24.97999  9 471.4913 584.5087
## present X8       762 36.93237  9 678.4532 845.5468
##
## Confidence level used: 0.95 

emmeans(aez, ~ noise * angle)
## noise   angle emmean       SE    df lower.CL upper.CL
## absent  X0       462 28.97125 19.79 401.5263 522.4737
## present X0       492 28.97125 19.79 431.5263 552.4737
## absent  X4       510 28.97125 19.79 449.5263 570.4737
## present X4       660 28.97125 19.79 599.5263 720.4737
## absent  X8       528 28.97125 19.79 467.5263 588.4737
## present X8       762 28.97125 19.79 701.5263 822.4737
##
## Confidence level used: 0.95 

The predictions are the same, but the SEs and df for aez.mult are obtained from the multivariate model.

afex could provide an option so that the user may choose which analysis they want. This can be done by adding, say, a mode argument to emm_basis.afex_aov, that can have values like "multivariate" (would use the lm member like in this example) and "univariate" (the current default). See current emmeans support for such as lme objects (simple) or clm (complex) to see how this may be done. I think all you need to do in this case is to call emm_basis for the lm member and set misc$ylevs to the needed levels.

@singmann
Copy link
Owner

I have now implemented emmeans support for multivariate models in addition to the existing univariate models, basically as described here. To get the multivariate tests one needs to set model = "multivariate" in the call to emmeans() et al. The following example also shows that one can set this globally.

data(md_12.1)
aez <- aov_ez("id", "rt", md_12.1, within = c("angle", "noise"))

emmeans(aez, c("angle", "noise"), model = "univariate")
#  angle noise   emmean       SE    df lower.CL upper.CL
#  X0    absent     462 28.97125 19.79 401.5263 522.4737
#  X4    absent     510 28.97125 19.79 449.5263 570.4737
#  X8    absent     528 28.97125 19.79 467.5263 588.4737
#  X0    present    492 28.97125 19.79 431.5263 552.4737
#  X4    present    660 28.97125 19.79 599.5263 720.4737
#  X8    present    762 28.97125 19.79 701.5263 822.4737
# 
# Confidence level used: 0.95 

emmeans(aez, c("angle", "noise"), model = "multivariate")
#  angle noise   emmean       SE df lower.CL upper.CL
#  X0    absent     462 18.00000  9 421.2812 502.7188
#  X4    absent     510 27.20294  9 448.4627 571.5373
#  X8    absent     528 24.97999  9 471.4913 584.5087
#  X0    present    492 28.00000  9 428.6596 555.3404
#  X4    present    660 34.64102  9 581.6366 738.3634
#  X8    present    762 36.93237  9 678.4532 845.5468
# 
# Confidence level used: 0.95 

afex_options(emmeans_model = "multivariate") # set globally
emmeans(aez, c("angle", "noise"))
#  angle noise   emmean       SE df lower.CL upper.CL
#  X0    absent     462 18.00000  9 421.2812 502.7188
#  X4    absent     510 27.20294  9 448.4627 571.5373
#  X8    absent     528 24.97999  9 471.4913 584.5087
#  X0    present    492 28.00000  9 428.6596 555.3404
#  X4    present    660 34.64102  9 581.6366 738.3634
#  X8    present    762 36.93237  9 678.4532 845.5468
# 
# Confidence level used: 0.95 

I have also changed the help page which now reads:

A caveat regarding the use of emmeans concerns the assumption of sphericity for ANOVAs including within-subjects/repeated-measures factors (with more than two levels). The current default for follow-up tests uses a univariate model (model = "univariate" in the call to emmeans), which does not adequately control for violations of sphericity. This may result in anti-conservative tests and contrasts somewhat with the default ANOVA table which reports results based on the Greenhousse-Geisser correction. An alternative is to use a multivariate model (model = "multivariate" in the call to emmeans) which should handle violations of sphericity better. The default will likely change to multivariate tests in one of the next versions of the package.

Does this solve this issue for you?

@rvlenth
Copy link
Contributor Author

rvlenth commented Feb 25, 2018

Important note

In DESCRIPTION, be sure to put in the dependency emmeans (>= 1.1.2).

Testing the example

I can't get the example to run on my system:

> data(md_12.1)
> aez <- aov_ez("id", "rt", md_12.1, within = c("angle", "noise"))
Error in `$<-.data.frame`(`*tmp*`, ges, value = numeric(0)) : 
  replacement has 0 rows, data has 4

This could be a problem on my system. But I updated all packages and the problem persists. FWIW, here is my session information:

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] afex_0.20-1   emmeans_1.1.2 lme4_1.1-15   Matrix_1.2-12

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.15        mvtnorm_1.0-7       lattice_0.20-35     zoo_1.8-1          
 [5] digest_0.6.15       cellranger_1.1.0    plyr_1.8.4          backports_1.1.2    
 [9] acepack_1.4.1       stats4_3.4.3        coda_0.19-1         ggplot2_2.2.1      
[13] pillar_1.1.0        rlang_0.2.0         lazyeval_0.2.1      curl_3.1           
[17] multcomp_1.4-8      readxl_1.0.0        rstudioapi_0.7      minqa_1.2.4        
[21] data.table_1.10.4-3 car_3.0-0           nloptr_1.0.4        rpart_4.1-11       
[25] checkmate_1.8.5     splines_3.4.3       stringr_1.3.0       foreign_0.8-69     
[29] htmlwidgets_1.0     munsell_0.4.3       compiler_3.4.3      base64enc_0.1-3    
[33] lmerTest_2.0-36     htmltools_0.3.6     nnet_7.3-12         tibble_1.4.2       
[37] gridExtra_2.3       htmlTable_1.11.2    coin_1.2-2          Hmisc_4.1-1        
[41] rio_0.5.9           codetools_0.2-15    MASS_7.3-48         grid_3.4.3         
[45] nlme_3.1-131.1      xtable_1.8-2        gtable_0.2.0        magrittr_1.5       
[49] scales_0.5.0        stringi_1.1.6       estimability_1.3    carData_3.0-1      
[53] reshape2_1.4.3      latticeExtra_0.6-28 sandwich_2.4-0      openxlsx_4.0.17    
[57] Formula_1.2-2       TH.data_1.0-8       RColorBrewer_1.1-2  tools_3.4.3        
[61] forcats_0.3.0       parallel_3.4.3      abind_1.4-5         survival_2.41-3    
[65] yaml_2.1.16         colorspace_1.3-2    cluster_2.0.6       knitr_1.20         
[69] haven_1.1.1         modeltools_0.2-21  

(wouldn't it be nice if those namespaces were alphabetized?)

@singmann
Copy link
Owner

Are you sure you are using the CRAN version of emmeans? This exact error only happened to me with the old version of emmeans. Once I downloaded from CRAN it disappeared.

And regarding the package version. I have simply added the corresponding check to the function (instead to the package level). If one has an older version one can still use emmeans. However, if the user wants to use multivariate tests, it fails with an appropriate error message (so given that your package version is high enough, not for you).

@rvlenth
Copy link
Contributor Author

rvlenth commented Feb 26, 2018

Henrik,

Please note that the error I experienced occurred while trying to fit the model; I never got as far as calling any emmeans functions. I traced the call into aov_car, and it appears the error occurs around line 250 in the function body, in the code block:

   if (return == "afex_aov") {
   # lines omitted ...
        afex_aov$anova_table <- do.call("anova", args = c(object = list(afex_aov),  ### <- Right here ###
            observed = list(observed), anova_table))
        return(afex_aov)
    }

I traced it further into anova.afex_aov, and the error occurs on the following line (line 87 of the function body)

        es_df$ges <- tmp2$SS/(tmp2$SS + sum(unique(tmp2[, "Error SS"])) + 
            obs_SSn1 - obs_SSn2)

which tries to access non-existing columns:

Browse[3]> names(tmp2)
[1] "Sum Sq"   "num Df"   "Error SS" "den Df"   "F value"  "Pr(>F)"   "MSE" 

I tried tricking it into working by creating tmp2$SS, but then another error occurs later on.

At any rate, I'm fairly certain that the error isn't occurring in emmeans. I wonder if there are changes you haven't yet pushed to github?

@singmann
Copy link
Owner

Hi Russ,
Sorry that was my fault for not reading your message correctly. It was already late on Sunday evening for me.

The reason for this bug is that you use the development version of car (3.0). I have also downloaded it on the weekend and tried it and got the same bug with afex. They must have changed the return value of the Anova() function or something like that. I have to investigate this further, but I expect to only find the time at the end of the week.

So I guess when you downgrade it should work.

@rvlenth
Copy link
Contributor Author

rvlenth commented Feb 27, 2018

Aha! Indeed, downgrading to the current CRAN version of car (rather than the development version) makes all the difference. I now get it to work as shown.

I still suggest including emmeans (>= 1.1.2) in DESCRIPTION. You are right that the checks in recover_data.aov_ez trap the situation where emmeans needs to be upgraded; but the version dependency in DESCRIPTION would cause emmeans to be automatically upgraded also when a user does install.packages("afex").

Also, it might be helpful to include a multivariate illustration in examples(aov_ez).

@singmann
Copy link
Owner

I will definitely add more examples about the new functionality.

And I agree that automatic updating of emmeans seems like the better idea. There is not much to loose by adopting this.

I should maybe also consider demoting the dependency of emmeans from Depends to Suggests. In this case, users can use afex without needing emmeans (which again needs other packages such as multcomp which is not absolutely necessary).

@singmann
Copy link
Owner

I have finally added the emmeans version number to the package. This afex version will be submitted to CRAN today.

Now, I only need to figure out how to provide methods for emmeans while only having it in Suggests and not in Depends. Any ideas?

@rvlenth
Copy link
Contributor Author

rvlenth commented Jun 24, 2018 via email

@rvlenth
Copy link
Contributor Author

rvlenth commented Jun 25, 2018 via email

@singmann
Copy link
Owner

Hi Russ,

Thanks for the detailed explanation. I have tried to move it to Suggests and had the problem that R CMD check complained that the generic was not there. Good to know that I just need to export emm_basis.myobject for it to work without explicitly mentioning that it is an S3 method. I guess the same holds for recover_data.myobject.

I have just yesterday pushed a new version of afex to CRAN in which I have removed the strong dependency on coin (now only in suggests) and stringr (exchanged all corresponding calls with calls to base R functions). I plan to move emmeans from Depends to Suggests in the next update. So I will make this change in the next days and then try it out for some time before pushing it to CRAN. Great to hear that you also reduce the package dependency footprint of emmeans. Small dependency footprints are the way forward. Users can attach what they need to their workspace on their own.

Cheers,
Henrik

@singmann
Copy link
Owner

singmann commented Aug 9, 2018

I have just pushed a version to github where emmeans is only in suggests and need to be attached explicitly either via library("emmeans") or via emmeans::emmeans(). I will test it a bit more, but will try to submit this version to CRAN this month.

@rvlenth
Copy link
Contributor Author

rvlenth commented Aug 10, 2018

Henrik,

Sounds promising! I took a quick glance at your zzz.R file, and it appears that you are correctly registering the methods needed by emmeans.

Guessing that this is a hectic time for you as you prepare to move to a new position. My hopes that everything goes smoothly and you'll be quickly settled in the new place.

@JohnnyDoorn
Copy link

Hi @rvlenth and @singmann,

Apologies for digging up such an ancient thread, but I've been looking into this option myself for how we handle follow-up tests in the JASP RM ANOVA. After a chat with Henrik some years ago, I added this option to JASP, but I now want to make the multivariate model the default. I cannot find much information about the underlying models/functions though - are there any resources that explain the difference between univariate/multivariate in more detail?

Thanks for your help, and for your respective packages - they make my life a lot easier =)
Johnny

@singmann
Copy link
Owner

Hi Johnny,

To be quite honest I am not sure about the proper mathematical differences between both methods. However, I had a student do some simulations on the Type I error properties recently. The first results indeed support switching to multivariate as it seems to maintain Type I errors better. As soon as I have the report I am happy to post more details.

Cheers,
Henrik

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants