DML models betas using lm() but shouldn't it use glm(family = "binomial") or model M-values? #192

jdreyf · 2024-12-19T03:29:11Z

In R/dm.R in the DML function, it models using betas as the dependent variable and uses linear regression with lm() e.g. m0 <- lm(betas[i,]~.+0, data=as.data.frame(mm)) But I think if you want to use lm(), the dependent variables should be M-values, whereas if you want the dependent variable to be betas, you should use glm(family="binomial").

Also, https://bioconductor.org/packages/release/bioc/vignettes/sesame/inst/doc/modeling.html dated 29 Oct 2024 describes DML as using "mixed linear models", but I didn't see that in the code.

Thanks for your fantastic papers on DNAme and this helpful software.

Best,
JD

zwdzwd · 2024-12-25T11:07:04Z

Thanks for the suggestion. I agree that binomial regression (or maybe beta regression as the number of total trials is often unclear for array data) is better in theory, but I haven't seen a main issue in practice with linear models. I personally don't like M values as they lack a direct physical interpretation (like % for the beta value).

Re: Mixed linear models: I apologize for the oversight. it was implemented but deleted in recent re-implementation of the package. I am struggling to find time, but will reinstate asap.

jdreyf · 2024-12-25T19:11:49Z

Even though M-values lack a physical interpretation, I would still recommend using them for the linear regression and reporting the mean beta values per group. I think these mean beta values should be calculated by taking the mean M-value per group and transforming to the beta scale. If you want to compare regressing M-values vs. beta values in practice, I think you could look at cases where the betas are near zero or one, since then the betas are quite non-normal.

I really appreciate Sesame implementing the innovate processing routines you developed and given Sesame's broad functionality, I can understand why you might struggle to find time for updates and maintenance. For example, the documentation for DML didn't seem to state that it applies linear regression to beta values. In case it might be helpful, I don't think you need to build out sophisticated DML and DMR analyses, since folks could use other packages for these steps, e.g. for DML folks could apply Limma (where random effects can be accounted for with Limma's duplicateCorrelation) or apply Dream mixed effects analyses from variancePartition, and for DMR they could apply ipDMR.

Thanks for responding do quickly,
JD

zwdzwd · 2024-12-25T19:45:36Z

Yeah, it's fair that you can always transfer back to beta after regression. But I usually find myself filtering small DM events in delta beta (but not delta M) as a post-processing to control effect size anyway. DM with small effects close to zero or one should be interpreted cautiously due to the background effect.

You are 100% right!! Savvy users can do these sophisticated DM analysis using limma etc. quite easily.

jdreyf changed the title ~~DML models betas using lm() but should probably use glm(family = "binomial") or model M-values~~ DML models betas using lm() but shouldn't it use glm(family = "binomial") or model M-values? Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DML models betas using lm() but shouldn't it use glm(family = "binomial") or model M-values? #192

DML models betas using lm() but shouldn't it use glm(family = "binomial") or model M-values? #192

jdreyf commented Dec 19, 2024

zwdzwd commented Dec 25, 2024

jdreyf commented Dec 25, 2024

zwdzwd commented Dec 25, 2024

DML models betas using lm() but shouldn't it use glm(family = "binomial") or model M-values? #192

DML models betas using lm() but shouldn't it use glm(family = "binomial") or model M-values? #192

Comments

jdreyf commented Dec 19, 2024

zwdzwd commented Dec 25, 2024

jdreyf commented Dec 25, 2024

zwdzwd commented Dec 25, 2024