Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DML models betas using lm() but shouldn't it use glm(family = "binomial") or model M-values? #192

Open
jdreyf opened this issue Dec 19, 2024 · 3 comments

Comments

@jdreyf
Copy link

jdreyf commented Dec 19, 2024

In R/dm.R in the DML function, it models using betas as the dependent variable and uses linear regression with lm() e.g. m0 <- lm(betas[i,]~.+0, data=as.data.frame(mm)) But I think if you want to use lm(), the dependent variables should be M-values, whereas if you want the dependent variable to be betas, you should use glm(family="binomial").

Also, https://bioconductor.org/packages/release/bioc/vignettes/sesame/inst/doc/modeling.html dated 29 Oct 2024 describes DML as using "mixed linear models", but I didn't see that in the code.

Thanks for your fantastic papers on DNAme and this helpful software.

Best,
JD

@jdreyf jdreyf changed the title DML models betas using lm() but should probably use glm(family = "binomial") or model M-values DML models betas using lm() but shouldn't it use glm(family = "binomial") or model M-values? Dec 19, 2024
@zwdzwd
Copy link
Owner

zwdzwd commented Dec 25, 2024

Thanks for the suggestion. I agree that binomial regression (or maybe beta regression as the number of total trials is often unclear for array data) is better in theory, but I haven't seen a main issue in practice with linear models. I personally don't like M values as they lack a direct physical interpretation (like % for the beta value).

Re: Mixed linear models: I apologize for the oversight. it was implemented but deleted in recent re-implementation of the package. I am struggling to find time, but will reinstate asap.

@jdreyf
Copy link
Author

jdreyf commented Dec 25, 2024

Even though M-values lack a physical interpretation, I would still recommend using them for the linear regression and reporting the mean beta values per group. I think these mean beta values should be calculated by taking the mean M-value per group and transforming to the beta scale. If you want to compare regressing M-values vs. beta values in practice, I think you could look at cases where the betas are near zero or one, since then the betas are quite non-normal.

I really appreciate Sesame implementing the innovate processing routines you developed and given Sesame's broad functionality, I can understand why you might struggle to find time for updates and maintenance. For example, the documentation for DML didn't seem to state that it applies linear regression to beta values. In case it might be helpful, I don't think you need to build out sophisticated DML and DMR analyses, since folks could use other packages for these steps, e.g. for DML folks could apply Limma (where random effects can be accounted for with Limma's duplicateCorrelation) or apply Dream mixed effects analyses from variancePartition, and for DMR they could apply ipDMR.

Thanks for responding do quickly,
JD

@zwdzwd
Copy link
Owner

zwdzwd commented Dec 25, 2024

Yeah, it's fair that you can always transfer back to beta after regression. But I usually find myself filtering small DM events in delta beta (but not delta M) as a post-processing to control effect size anyway. DM with small effects close to zero or one should be interpreted cautiously due to the background effect.

You are 100% right!! Savvy users can do these sophisticated DM analysis using limma etc. quite easily.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants