Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

what can we use for multithreaded/multiprocess safe signature writing? #1911

Open
ctb opened this issue Mar 30, 2022 · 4 comments
Open

what can we use for multithreaded/multiprocess safe signature writing? #1911

ctb opened this issue Mar 30, 2022 · 4 comments

Comments

@ctb
Copy link
Contributor

ctb commented Mar 30, 2022

per #1671 (comment), we don't have a good way to write signatures to a single file from multiple processes. Such a thing would be nice, but would probably entail some kind of locking...

This is a challenge for things like sourmash sketch fromfile where we would like to have parallel signature sketching but would need to make the output single-threaded.

@ctb
Copy link
Contributor Author

ctb commented May 5, 2022

keywords: parallel output

@ctb
Copy link
Contributor Author

ctb commented May 5, 2022

idea in #2033: write to many different files, use a single manifest to point at the different files.

@ctb
Copy link
Contributor Author

ctb commented Sep 4, 2022

@dkoslicki asked about multithreaded sketching on matrix chat; I responded:

dug into this a bit more, the real problem is here: #1911 - we don't have a good way to write sketches to a single file from multiple processes.

@luizirber added -

But we can set a specific thread to write and still calculate the sigs in parallel. I did a quick skim in sourmash sketch fromfile and the place that need to change is https://github.com/sourmash-bio/sourmash/blob/b1ddabcb05d3455affa862df33b039348c437d61/src/sourmash/command_sketch.py#L301L351

either going multiprocessing or rewriting this function in Rust can achieve parallelism. Easier if the order of the sketches added is not important (and I don't think it is? manifests are the source of order), but even if it needs to be in the same order it is doable.

@ctb
Copy link
Contributor Author

ctb commented Sep 23, 2023

see plugin https://github.com/sourmash-bio/pyo3_branchwater, manysketch command, which writes to zip files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant