Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support sketch fromfile parallelism with manifest output #2033

Open
ctb opened this issue May 5, 2022 · 0 comments
Open

support sketch fromfile parallelism with manifest output #2033

ctb opened this issue May 5, 2022 · 0 comments

Comments

@ctb
Copy link
Contributor

ctb commented May 5, 2022

right now, sourmash sketch fromfile doesn't readily support building signatures in parallel, because we don't have a good way to save the signatures to disk in a threadsafe way from multiple processes, per #1911.

so, for example, you can't have two processes writing to the same collection format.

HOWEVER, using the collection manifest support (e.g. as used for wort-genomes #1965) we could definitely do the following:

  • write signatures to different .zip files, one for each sketch fromfile process
  • build a combined manifest across all the different zip files that loads from each zip file

We'd have to do some more testing to make sure it works, but we could implement this all in one go, where the SQL manifest is created up front and then added to by each process, too. Not sure it's necessary.

I'd probably implement this in snakemake, to support multi-node parallelism. But it might need some additional support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant