-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: Reformat gather output to include taxonomy #384
Comments
note that we have no requirement for taxonomic information at present in the
gather database. so have to think about how we could integrate it when
it is present but do without when it isn't.
|
That's tricky but "undefined" might work. Not sure if there are standards but I'll look around. |
Just revisiting this with some more thoughts - gather works by finding the signature among the search subjects that best matches the hashes in the query, subtracting the matched hashes, and then repeating with the remaining hashes. The name output by gather comes from the name of each found signature. So to fix this we would have to update signatures to have taxonomic information (which is a big burden on the user - it's a reason I frequently don't use the lca search!) But it might be possible to do the same 'gather' algorithm but with an lca database... so you'd have 'sourmash lca gather' that would output taxonomic info. Humm. |
See #390; example output so far:
|
Closed by #390. |
Can we modify the gather output to separate matches into columns based on taxonomic rank? It's sometimes difficult to do this at the command line when the species or strain id spans multiple fields. For example, KQ235715.1 Fusobacterium nucleatum subsp. animalis D11 genomic scaffold adfWA-supercont2.1. I think @ctb may have already done this for lca.
The text was updated successfully, but these errors were encountered: