-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sourmash compare
runs out of memory on large comparisons
#3134
Comments
hi @yuzie0314 - yes, the there is a long issue #2299 about this. we are still in a bit of a confused state in terms of recommendations, but the gist of that issue is: you should be able to use @bluegenes any words of wisdom here? |
Hi @ctb, This is a really good news to us, |
Hi @ctb the author, I used the following command and generated a csv result.
The result is different from Thanks for your help, |
BrokenPipeError: [Errno 32] Broken pipe
happend in sourmash compare commandsourmash compare
runs out of memory on large comparisons
matrix vs CSV outputhi @yuzie0314, yep, the output of Please see sourmash-bio/sourmash_plugin_branchwater#198 for a script that converts the CSV file into a numpy matrix. I haven't tried it yet myself, I'm afraid, but if you run into problems please feel free to post here and we'll see what we can do! how does
|
hi @yuzie0314 I got inspired by your question (and also by some of my own research needs ;)) and built a plugin that I think will help you - see https://github.com/sourmash-bio/sourmash_plugin_betterplot/. Specifically:
If you have suggestions or requests for further functionality, please let me know! It's easy and fun to add new stuff to this plugin! |
wow, a little complicated answers, I would need some time to test on my environment, but thanks for your contribution again, really helpful @ctb. I think the solutions you provided is worth us to explore. Will update you once we have any doubts and good news. |
Hi @ctb, The results from The results from As you can notice that
|
That's a great test! You should get identical results (although I will confess I have not tried it myself). I will try it out on my own set of data, but - I'm curious - why set |
ohh ya, Is there anything I missing? just let me know I would like to test in my environment.
|
I'll have to take a look. Have you compared the Jaccard index or containment matrices, rather than the ANI? I'm wondering if there's a difference in the ANI calculations - then Jaccard/containment would be the same, but ANI would be different. (Which would be a bug, just to be clear!) |
Sorry for the late, we were only focusing on the ani comparison results, and didn't check other methods. |
I am a little bit worried that the ANI numbers in sourmash_plugin_branchwater are incorrect - we are seeing differences in |
hi! I did some quick validation on a subset of
|
ok, I sat down and did a much more thorough evaluation over in sourmash-bio/sourmash_plugin_branchwater#366, for the This comparison generated They are identical 🎉 I'll close this issue when I add a mention of the |
Hi @ctb sorry for the late, my command :
In addition, I also checked their labels are in the same order....so what I could image the reasons are:
but chances are that I think the main reason might be the former, because I also wrote a very ugly py code to transfer the results from P.S. |
Hi @ctb the sourmash author,
Currently we are working on using your tool to find the representative MAGs within a customed data set assembled from several deeply sequenced stool shotgun samples. Those MAGs actually are classified as the same family level using gtdbtk reference genomes.
We have up to 14,500 genomes in this data set, and we want to compute an ani pair-wise matrix using the following command.
sourmash compare -p 8 -k 31 --ani -o ani_matrix.numpy --csv ani_matrix.csv cluster_mash/*.sig
However, after around 1 hr processing, we got a weird error called BrokenPipeError, so we started to think if there is any limitation when using sourmash compare to generate an ani matrix. I think this kind of error is dereived from out off memory, correct me if I am wrong.
P.S. We are using 16 cores and 32 Gb ram, aws EC2 Linux.
we also saw a message called
Killedison for index 886 done in 9.36945 seconds
, which might be another reason why this error happend.Current version is v4.8.2 sourmash.
The text was updated successfully, but these errors were encountered: