shared LCA index for many processes #909

phiweger · 2020-02-21T10:24:11Z

is there a way to load the LCA index into memory once, but then to query it with many different processes WITHOUT specifying a list of queries beforehand? ie to treat the index more like a database?

my use case is that I would like to use LCA as part of a workflow managed my snakemake, but I'd like to allow parallel queries. using eg sourmash lca classify ... only woks if I pass all the genomes at once, which does not leverage the workflow manager's ability to distribute processes.

thx!

The text was updated successfully, but these errors were encountered:

phiweger · 2020-02-21T10:24:41Z

can the LCA index be loaded into Redis @luizirber ?

luizirber · 2020-02-21T19:51:25Z

is there a way to load the LCA index into memory once, but then to query it with many different processes WITHOUT specifying a list of queries beforehand? ie to treat the index more like a database?

my use case is that I would like to use LCA as part of a workflow managed my snakemake, but I'd like to allow parallel queries. using eg sourmash lca classify ... only woks if I pass all the genomes at once, which does not leverage the workflow manager's ability to distribute processes.

Right now, no... A very hackish way would be to do something like sourmash watch, but reading signatures from the input pipe. And maybe piped outputs in snakemake could be useful?

can the LCA index be loaded into Redis @luizirber ?

No... LCA doesn't implement any of the storages available for SBT (or even uses the same logic). That said, it wouldn't be a stretch to add the current LCA format to a Redis database, but I'm not sure that would make things faster (since every dictionary element access would have to go thru Redis), but maybe it works?

I sort of started doing something for search in wort, but is not ready yet (and won't be in the next 3 months, at least).

ctb · 2020-04-23T14:29:15Z

starting to discuss ways to improve LCA_Database loading time over in #821 and #948.

ctb · 2021-05-08T12:36:34Z

ref #1484 too - getting closer to being able to do this with generic RPC built around Index API.

phiweger · 2021-05-08T15:22:35Z

not sure if this helps, but one can run the metagraph index like a server and communicate with it via ports and some python bindings -- something similar would be so cool to have for SBT/ LCA indices -- https://github.com/ratschlab/metagraph

ctb · 2022-01-26T15:25:56Z

@phiweger #1808 might be interesting to you - sqlite3-based storage and search of databases, with fast startup and (presumably) multiple read-only clients. Does not yet implement the taxonomy side of LCA databases, but I think that's straightforward at this point, and I could prioritize if it meets your other needs.

phiweger · 2022-01-26T20:07:17Z

uh! will this seems like a nice solution. I will try to code up a small workflow of read-in-parallel processes, thanks!

ctb · 2023-09-23T15:59:08Z

Closing, given that we are now shifting (slowly but surely) to mastiff style indices and other very performant on-disk data structures.

ctb mentioned this issue Apr 23, 2020

consider ways to improve speed of LCA database #821

Closed

ctb mentioned this issue Jun 20, 2020

SBT loading in memory #475

Closed

ctb mentioned this issue Jul 18, 2020

use a faster json library in lca/lca_db.py #1111

Closed

ctb mentioned this issue Jul 21, 2021

Multiple queries to gather #1681

Closed

ctb mentioned this issue Apr 19, 2022

[MRG] add sqlite3 implementations for Index, CollectionManifest, and LCA_Database #1808

Merged

33 tasks

ctb closed this as completed Sep 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

shared LCA index for many processes #909

shared LCA index for many processes #909

phiweger commented Feb 21, 2020

phiweger commented Feb 21, 2020

luizirber commented Feb 21, 2020

ctb commented Apr 23, 2020

ctb commented May 8, 2021

phiweger commented May 8, 2021

ctb commented Jan 26, 2022

phiweger commented Jan 26, 2022

ctb commented Sep 23, 2023

shared LCA index for many processes #909

shared LCA index for many processes #909

Comments

phiweger commented Feb 21, 2020

phiweger commented Feb 21, 2020

luizirber commented Feb 21, 2020

ctb commented Apr 23, 2020

ctb commented May 8, 2021

phiweger commented May 8, 2021

ctb commented Jan 26, 2022

phiweger commented Jan 26, 2022

ctb commented Sep 23, 2023