-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SBT loading in memory #475
Comments
What is wort? |
@phiweger it's a webservice for computing/retrieving/searching sourmash signatures. I don't have much yet, but an overview is available at https://github.com/dib-lab/wort/blob/master/docs/arch.md and it is online (and horrible to navigate...) at https://wort.oxli.org |
kind of related to #909, sharing an LCA index once for many processes. |
I now run into this problem a lot, especially when querying many signatures as part of larger workflows. The workflow manager will usually start many processes searching signatures, but with any reasonably sized SBT this crashes pretty quickly bc/ it tries to load one SBT into memory for each process. @ctb if I use the API to load the SBT and then python multiprocess queries against it, what will happen? ;) |
@luizirber how do you manage queries w/ wort? I assume you don't load the index once for each query? |
So, the SRA search is cheating =]
It will probably mostly-sorta-kinda work. I'm a bit nervous because the SBT code loads data from disk dynamically, and in a multithreaded context this can lead to data races and other weirdness (there is no locking in any point). |
the |
hey, I think greyhound and #1226 actually help solve this too |
with sourmash v4.1.0 the memory usage of SBTs has dramatically decreased; see #1370 (comment) specifically. for in-memory single-process stuff, Finally, for read only SBTs, I very much doubt there would be any problems with sharing them between processes from disk. |
@ctb when I load an LCA db into mem with |
Yep, they are interchangeable from an API perspective! We have some (minimal :) documentation here, https://sourmash.readthedocs.io/en/latest/command-line.html#indexed-databases and there's an example of using/constructing an in-memory https://github.com/dib-lab/charcoal/blob/latest/charcoal/compare_taxonomy.py#L177 Note that the |
Is there a way to load an SBT into memory once and then keep it there for various queries? I know that a Redis backend was toyed with at some point, but I am unsure if this was integrated into v2.0
Thank you,
Adrian
The text was updated successfully, but these errors were encountered: