Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High memory usage of modkit extract full #306

Open
Shians opened this issue Nov 27, 2024 · 1 comment
Open

High memory usage of modkit extract full #306

Shians opened this issue Nov 27, 2024 · 1 comment
Labels
troubleshooting workflow and data preparation questions

Comments

@Shians
Copy link

Shians commented Nov 27, 2024

I have run dorado with m5C,inosine_m6A,pseU and want to extract the modifications into a table. However when I run modkit extract full it keeps being killed for being out of memory. I've bumped memory allocation to 256GB for a 7.8GB BAM file and it's still being killed. I am wondering if there's an obvious answer for why this happens, does modkit store the whole modification table in memory before writing it out?

@ArtRand
Copy link
Contributor

ArtRand commented Nov 27, 2024

Hello @Shians,

The command should stream the output to either standard out or a file depending on what you've specified. The program tends to be write-bound, however, so what happens is unwritten records are buffered in memory. If you've run a model with C, A, and U modification calls (and A-mods are 3-class), you'll end up accumulating many records for each read so the buffered data can be quite large (as you've seen). There are some methods in the documentation for how to reduce the memory usage. I also generally recommend, when possible, to only extract the reads that you need to operate on. For example grab the reads for a one or a few transcripts at a time.

@ArtRand ArtRand added the troubleshooting workflow and data preparation questions label Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
troubleshooting workflow and data preparation questions
Projects
None yet
Development

No branches or pull requests

2 participants