Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Add scripts/pairwise_to_matrix/ helper script #198

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

mr-eyes
Copy link
Member

@mr-eyes mr-eyes commented Jan 27, 2024

Description

This pull request adds a new 'scripts/' directory to the Branchwater project. The directory is dedicated to stand-alone scripts designed to post-process the output generated by various Branchwater commands.

Adding scripts/pairwise_to_matrix

This script efficiently converts pairwise command outputs into sparse matrices and enables their export to TSV format. The script uses Dask, so it's engineered for large-scale data processing.

@ctb
Copy link
Collaborator

ctb commented Jan 28, 2024

thanks!!

The downside to this approach is that scripts/ isn't installed by Python currently ;). So more work needed there...

@ctb
Copy link
Collaborator

ctb commented Jan 28, 2024

and, as noted in the docs, lots of tricky dependencies - esp h5py. I suspect this will work well as a separate plugin!

@ctb ctb changed the title Add scripts/pairwise_to_matrix WIP: Add scripts/pairwise_to_matrix/ helper script Jan 28, 2024
@ctb
Copy link
Collaborator

ctb commented May 19, 2024

a dumb, and presumably much less efficient, version of this functionality is available in https://github.com/sourmash-bio/sourmash_plugin_betterplot/ as pairwise_to_matrix.

@ctb
Copy link
Collaborator

ctb commented Jun 21, 2024

a dumb, and presumably much less efficient, version of this functionality is available in https://github.com/sourmash-bio/sourmash_plugin_betterplot/ as pairwise_to_matrix.

To resolve this PR, we should benchmark it against pairwise_to_matrix in betterplot. If it is in fact much, much less memory intensive and/or faster, we should keep it and consider merging (or incorporating in a different plugin). If it's reasonably comparable, I would argue for instead closing it and putting effort into testing the pairwise_to_matrix code :).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants