-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[EXP] add a prefetch
linear search function to Index
#1371
Conversation
Fun fact!! When turning The two that don't pass are -
|
Legend has it that @luizirber will soon come place a comment on here about how This appears to be accurate, with one exception: the |
I not sure they are the same same, but it sure feels like |
absolutely agree (I assume you mean prefetch/find and filter/select). |
…o refactor/index_find
* have the 'find' function for SBTs return signatures * fix majority of tests * split find and _find_nodes to take different kinds of functions * redo 'find' on index * refactor lca_db to use new find * refactor SBT to use new find * refactor out common code * use 'passes' properly * adjust tree downsampling for regular minhashes, too * remove now-unused search functions in sbtmh * refactor categorize to use new find * fix jaccard calculation in sbt * check for compatibility of search fn and query signature * switch tests over to jaccard similarity, not containment * remove test for unimplemented LCA_Database.find method * document threshold change; update test * refuse to run abund signatures * flatten sigs internally for gather * reinflate abundances for saving * fix problem where sbt indices coudl be created with abund signatures * split flat and abund search * make ignore_abundance work again for categorize * turn off best-only, since it triggers on self-hits. * add test: 'sourmash index' flattens sigs * location is now a property * move search code into search.py * remove redundant scaled checking code * best-only now works properly for two tests * 'fix' tests by removing v1 and v2 SBT compatibility * simplify downsampling code * require keyword args in MinHash.downsample(...) * fix test to use proper downsampling, reverse order to match scaled * flatten subject MinHash, too * add IndexSearchResult namedtuple for search and gather results * add more tests for Index classes * add tests for subj & query num downsampling * tests for Index.search_abund * refactor make_jaccard_search_query; start tests * test collect, best_only * deal with status == None on SystemExit * upgrade and simplify categorize * fix abundance search in SBT for categorize * add explicit test for incompatible num * add simple tests for SBT load and search API * allow arbitrary kwargs for LCA_DAtabase.find * add testing of passthru-kwargs * docstring updates * better tests for gather --save-unassigned * SBT search doesn't work on v1 and v2 SBTs b/c no min_n_below * add intersection_and_union_size method to MinHash * make flatten a no-op if track_abundance=False * intersection_union_size in the FFI Co-authored-by: Luiz Irber <[email protected]>
merged into #1370. |
NOTE: PR into #1370.
This PR adds:
--prefetch
option (currently defaulting to True) tosourmash gather
CLIprefetch
generator method onIndex
that does the linear search from [MRG] refactorgather
functionality for speed & modularity; provideprefetch
functionality. #1370/greyhound/etc.See comment on #1370 for motivation.
Checklist
make test
Did it pass the tests?make coverage
Is the new code covered?without a major version increment. Changing file formats also requires a
major version number increment.
changes were made?