-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
shared LCA index for many processes #909
Comments
can the LCA index be loaded into Redis @luizirber ? |
Right now, no... A very hackish way would be to do something like
No... LCA doesn't implement any of the storages available for SBT (or even uses the same logic). That said, it wouldn't be a stretch to add the current LCA format to a Redis database, but I'm not sure that would make things faster (since every dictionary element access would have to go thru Redis), but maybe it works? I sort of started doing something for |
ref #1484 too - getting closer to being able to do this with generic RPC built around |
not sure if this helps, but one can run the metagraph index like a server and communicate with it via ports and some python bindings -- something similar would be so cool to have for SBT/ LCA indices -- https://github.com/ratschlab/metagraph |
@phiweger #1808 might be interesting to you - sqlite3-based storage and search of databases, with fast startup and (presumably) multiple read-only clients. Does not yet implement the taxonomy side of LCA databases, but I think that's straightforward at this point, and I could prioritize if it meets your other needs. |
uh! will this seems like a nice solution. I will try to code up a small workflow of read-in-parallel processes, thanks! |
Closing, given that we are now shifting (slowly but surely) to mastiff style indices and other very performant on-disk data structures. |
is there a way to load the LCA index into memory once, but then to query it with many different processes WITHOUT specifying a list of queries beforehand? ie to treat the index more like a database?
my use case is that I would like to use LCA as part of a workflow managed my snakemake, but I'd like to allow parallel queries. using eg
sourmash lca classify ...
only woks if I pass all the genomes at once, which does not leverage the workflow manager's ability to distribute processes.thx!
The text was updated successfully, but these errors were encountered: