High memory usage of modkit extract full #306

Shians · 2024-11-27T00:58:26Z

I have run dorado with m5C,inosine_m6A,pseU and want to extract the modifications into a table. However when I run modkit extract full it keeps being killed for being out of memory. I've bumped memory allocation to 256GB for a 7.8GB BAM file and it's still being killed. I am wondering if there's an obvious answer for why this happens, does modkit store the whole modification table in memory before writing it out?

The text was updated successfully, but these errors were encountered:

ArtRand · 2024-11-27T16:39:33Z

Hello @Shians,

The command should stream the output to either standard out or a file depending on what you've specified. The program tends to be write-bound, however, so what happens is unwritten records are buffered in memory. If you've run a model with C, A, and U modification calls (and A-mods are 3-class), you'll end up accumulating many records for each read so the buffered data can be quite large (as you've seen). There are some methods in the documentation for how to reduce the memory usage. I also generally recommend, when possible, to only extract the reads that you need to operate on. For example grab the reads for a one or a few transcripts at a time.

ArtRand added the troubleshooting workflow and data preparation questions label Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High memory usage of modkit extract full #306

High memory usage of modkit extract full #306

Shians commented Nov 27, 2024

ArtRand commented Nov 27, 2024

High memory usage of modkit extract full #306

High memory usage of modkit extract full #306

Comments

Shians commented Nov 27, 2024

ArtRand commented Nov 27, 2024