Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shared LCA index for many processes #909

Closed
phiweger opened this issue Feb 21, 2020 · 8 comments
Closed

shared LCA index for many processes #909

phiweger opened this issue Feb 21, 2020 · 8 comments

Comments

@phiweger
Copy link

is there a way to load the LCA index into memory once, but then to query it with many different processes WITHOUT specifying a list of queries beforehand? ie to treat the index more like a database?

my use case is that I would like to use LCA as part of a workflow managed my snakemake, but I'd like to allow parallel queries. using eg sourmash lca classify ... only woks if I pass all the genomes at once, which does not leverage the workflow manager's ability to distribute processes.

thx!

@phiweger
Copy link
Author

can the LCA index be loaded into Redis @luizirber ?

@luizirber
Copy link
Member

is there a way to load the LCA index into memory once, but then to query it with many different processes WITHOUT specifying a list of queries beforehand? ie to treat the index more like a database?

my use case is that I would like to use LCA as part of a workflow managed my snakemake, but I'd like to allow parallel queries. using eg sourmash lca classify ... only woks if I pass all the genomes at once, which does not leverage the workflow manager's ability to distribute processes.

Right now, no... A very hackish way would be to do something like sourmash watch, but reading signatures from the input pipe. And maybe piped outputs in snakemake could be useful?

can the LCA index be loaded into Redis @luizirber ?

No... LCA doesn't implement any of the storages available for SBT (or even uses the same logic). That said, it wouldn't be a stretch to add the current LCA format to a Redis database, but I'm not sure that would make things faster (since every dictionary element access would have to go thru Redis), but maybe it works?

I sort of started doing something for search in wort, but is not ready yet (and won't be in the next 3 months, at least).

@ctb
Copy link
Contributor

ctb commented Apr 23, 2020

starting to discuss ways to improve LCA_Database loading time over in #821 and #948.

@ctb
Copy link
Contributor

ctb commented May 8, 2021

ref #1484 too - getting closer to being able to do this with generic RPC built around Index API.

@phiweger
Copy link
Author

phiweger commented May 8, 2021

not sure if this helps, but one can run the metagraph index like a server and communicate with it via ports and some python bindings -- something similar would be so cool to have for SBT/ LCA indices -- https://github.com/ratschlab/metagraph

@ctb
Copy link
Contributor

ctb commented Jan 26, 2022

@phiweger #1808 might be interesting to you - sqlite3-based storage and search of databases, with fast startup and (presumably) multiple read-only clients. Does not yet implement the taxonomy side of LCA databases, but I think that's straightforward at this point, and I could prioritize if it meets your other needs.

@phiweger
Copy link
Author

uh! will this seems like a nice solution. I will try to code up a small workflow of read-in-parallel processes, thanks!

@ctb
Copy link
Contributor

ctb commented Sep 23, 2023

Closing, given that we are now shifting (slowly but surely) to mastiff style indices and other very performant on-disk data structures.

@ctb ctb closed this as completed Sep 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants