Merging requires much memory. Is there a way to split the json output ? #35

lindenb · 2021-02-15T08:35:51Z

Hi all,
Thank you for ExpansionHunterDeNovo,

I'm currently testing ExpansionHunterDeNovo on a set of ~1500 WGS case/control. Everything is fine but the merging step takes too much memory and my jobs are usually killed by the cluser-manager.

Is there a way to split the json to reduce the required memory ? splitting by pattern ? splitting by chromosome ?

Thank you for your help.

egor-dolzhenko · 2021-02-16T05:07:53Z

Thank you for using the program!

I suspect that dinucleotide repeats are causing the issue. So splitting the analysis by the repeat unit length might be the way to go. Could you please run this Linux binary with "--min-unit-len" set to 3?

I think discarding all dinucleotide repeats from the downstream analysis may be reasonable anyway because (a) if there are very many dinucleotide repeats, they will dominate the analysis and make it much harder to detect expansions with longer motifs and (b) the vast majority of known pathogenic repeats have motifs of size 3 and longer.

We will consider changing the default value of "--min-unit-len" to 3 in the next release.

lindenb · 2021-02-16T22:08:27Z

@egor-dolzhenko thank you very much ! I won't be able to use your new version before next week.

egor-dolzhenko · 2021-02-16T23:30:17Z

Sounds good @lindenb! Please let me know if there are any issues with the new version.

lindenb · 2021-02-25T10:14:23Z

@egor-dolzhenko
hi , Thank you for the new binary. I tested it with ~400WGS and --min-unit-len 3 . Computing the 'merge' was much faster ! I've forwarded the results to my colleague biostatistician but at first glance I don't see anymore those low p-values that looked like some false positives.

egor-dolzhenko · 2021-02-25T21:02:06Z

Glad to hear it @lindenb! Thank you for the update

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merging requires much memory. Is there a way to split the json output ? #35

Merging requires much memory. Is there a way to split the json output ? #35

lindenb commented Feb 15, 2021

egor-dolzhenko commented Feb 16, 2021

lindenb commented Feb 16, 2021

egor-dolzhenko commented Feb 16, 2021

lindenb commented Feb 25, 2021

egor-dolzhenko commented Feb 25, 2021

Merging requires much memory. Is there a way to split the json output ? #35

Merging requires much memory. Is there a way to split the json output ? #35

Comments

lindenb commented Feb 15, 2021

egor-dolzhenko commented Feb 16, 2021

lindenb commented Feb 16, 2021

egor-dolzhenko commented Feb 16, 2021

lindenb commented Feb 25, 2021

egor-dolzhenko commented Feb 25, 2021