Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MRG: add utility functions for zip reading; use in index, multisearch #105

Merged
merged 4 commits into from
Sep 12, 2023

Conversation

bluegenes
Copy link
Contributor

@bluegenes bluegenes commented Sep 8, 2023

Extracted from #99, because all the changes were getting unruly.

This should be easier to review - contains the functions & modifications to use them in index and multisearch, with tests.

New in utils.py:

  • load_sigpaths_from_zip_or_pathlist
  • load_sketches_from_zip_or_pathlist
  • report_on_sketch_loading (unifies signature loading reporting)

Renamed:

  • read_signatures_from_zip --> load_sketches_from_zip
  • load_sketch_fromfile --> load_fasta_fromfile

Note: In the case of reading a pathlist from a zip file, this means we extract the sig files and store them in a temp directory which will be deleted when it goes out of scope. This adds a little overhead, but I think it could worth it for the simplicity of being able to specify a zip database. Open to removing, though.

@bluegenes
Copy link
Contributor Author

@ctb this is ready for review

src/utils.rs Outdated
sketchlist.push(sm);
} else {
// track number of paths that have no matching sigs
skipped_paths.fetch_add(1, atomic::Ordering::SeqCst);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need atomic types? This loop doesn't look like it's parallelized, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, not parallelized.

There is a parallel zip library we could use if we want, but probably not needed except when reading from genbank-scale zips.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed atomic in 1ea65be

@ctb
Copy link
Collaborator

ctb commented Sep 9, 2023

I like overall, but want to dig in to code a bit.

src/utils.rs Show resolved Hide resolved
src/utils.rs Show resolved Hide resolved
src/utils.rs Show resolved Hide resolved
Copy link
Collaborator

@ctb ctb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! good code simplification too 👍

@bluegenes bluegenes merged commit 59cd5a0 into main Sep 12, 2023
1 check passed
@bluegenes bluegenes deleted the add-zip-utils branch September 12, 2023 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants