Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vector memory exhausted issue #31

Open
leqi0001 opened this issue Jan 7, 2023 · 1 comment
Open

vector memory exhausted issue #31

leqi0001 opened this issue Jan 7, 2023 · 1 comment

Comments

@leqi0001
Copy link

leqi0001 commented Jan 7, 2023

Hi,

Thanks for developing this package!

I'm following the vignette and trying to run glmmSeq on a relatively small dataset (26 samples * 20k genes). The 26 samples are 13 pairs as a random effects variable of (1|individual). If I'm using the model ~disease+(1|condition)+covar1+covar2+covar3+covar4, R will give me Error: cannot allocate vector of size 6223.5 Gb. It runs ok if I remove 1 fixed effect variable. It wouldn't run on an HPC either, and I suppose no cores can handle a vector of this size.

@myles-lewis
Copy link
Owner

Hi leqi0001,

Thanks. I haven't seen this error before. I suggest you try to isolate the issue as follows:

  1. Take a column of data from just 1 gene
  2. Apply log2+1 so that it is converted to be more gaussian
  3. Add your metadata

Fit your model using:
fit <- lme4::lmer(formula, data)
where you formula is of the form gene ~ disease+(1|condition)+covar1+covar2+covar3+covar4

Examine the result using summary(fit)
See if this can work on a single gene. If this works, then move to trying the neg binom model:
fit <- lme4::glmer(formula, data, family = MASS::negative.binomial(theta = 1/disp))

Try fixing the dispersion disp to a simple value e.g. 1, which makes the model simpler as it is essentially a Poisson model. This time you'll need to provide count data not gaussian data: count ~ disease+(1|condition)+covar1+covar2+covar3+covar4

This way you will find out whether a mixed model of such a magnitude is feasible.

I suspect the model is too large. Mixed models get big quickly because in essence there's a regression for each 'individual' or random effect level.

Best,
Myles

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants