-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
differences between this plugin's gather and sourmash gather
#331
Comments
@bluegenes here's the notebook I'm using: https://github.com/ctb/2024-debug-gather-difference/blob/main/compare-picklist.ipynb it's mildly tricksy, because I had to force |
It looks like which suggests (but does not confirm ;) that |
Running with all these fixes, I now see the following differences remaining between the
Results over in ctb/2024-debug-gather-difference#1 |
After #353, remaining changes needed: -
I think the following require sourmash core changes:
|
Wondering if maybe ANI numbers in Rust are generally busted - per sourmash-bio/sourmash#3134 (comment). Something to investigate. |
@bluegenes nah, looks like the ANI calculation code is good from |
I COME BEARING NEWS! With the ANI fixes in #361 and sourmash-bio/sourmash#3218, the only remaining difference is in 🎉 Just to be clear: sourmash gather, fastgather, fastmultigather, and fastmultigather+rocksdb all return the SAME RESULTS now when using the comparison approach in ctb/2024-debug-gather-difference#1. Yay! The only remaining tricky bit that explained the
and somehow these two had equivalent unweighted matches (so you could pick either one legitimately!) but led to two different weighted results. I removed 🎉 |
Calculate ANI of matches against original query with `f_orig_query` and `f_match_orig`, instead of against `f_unique_to_query` and `f_match`. This fixes the ANI differences between `sourmash gather` and RocksDB branchwater gather for the columns `query_containment_ani`, `match_containment_ani`, `max_containment_ani`, and `average_containment_ani`. Refs: * Used by sourmash-bio/sourmash_plugin_branchwater#361 * Fixes RocksDB-based calculations for sourmash-bio/sourmash_plugin_branchwater#331
Post-#298, we now get full
gather
results out offastgather
andfastmultigather
. But there are some differences between what the plugin outputs and what the OGsourmash gather
outputs 😱 .First, note that {'filename', 'md5', 'name'} in OG gather are now {'match_filename', 'match_md5', 'match_name'}.
Also, 'potential_false_negative' is missing from plugin gather.
After that is dealt with, the following columns are the same 🎉 -
rounding differences?
std_abund
,f_unique_weighted
appear to be different just because they are floats.trivial/easy to fix differences
moltype
is lowercase in plugin gather, sodna
instead ofDNA
query_md5
is truncated to 8 characters in the OG gather.filename
means different things in OG gather and the plugin - in the OG gather, it's the filename of the database being searched, in the plugin it's ... the filename of the sig? not sure.real differences
query_n_hashes
andquery_bp
are the original query (so, constant) in OG gather, while in the plugin they are the size of the remaining query at that rankremaining_bp
is just different - looks like it's just being calculated very differently.max_containment_ani
is just quite different...??average_containment_ani
is just different, tooquery_containment_ani
is also differentmatch_containment_ani
is also differenttwilight zone differences
sum_weighted_found
values are all the same except for in one specific row. WTF.The text was updated successfully, but these errors were encountered: