-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MRG: update with misc animal genomes #3422
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## latest #3422 +/- ##
==========================================
- Coverage 86.44% 86.43% -0.02%
==========================================
Files 137 137
Lines 16103 16103
Branches 2219 2219
==========================================
- Hits 13920 13918 -2
- Misses 1876 1878 +2
Partials 307 307
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Ready for review & merge @bluegenes @ccbaumler ! |
doc/databases.md
Outdated
@@ -30,6 +30,20 @@ The databases do not need to be unpacked or prepared in any way after download. | |||
|
|||
You can verify that they've been successfully downloaded (and view database properties such as `ksize` and `scaled`) with `sourmash sig summarize <output>`. | |||
|
|||
## Sketches for human and animal genomes | |||
|
|||
These include k=21, k=31, and k=51, at a scaled of 1000. Each file is about 110 MB. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These include k=21, k=31, and k=51, at a scaled of 1000. Each file is about 110 MB. | |
These signature files are representative sketches of model organisms. There suggested usage is to identify and remove contamination from environmental sketches. These include k=21, k=31, and k=51, at a scaled of 1000. Each file is about 110 MB. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated in dc9d435 with:
+These sketches are of the latest releases of a number of animal
+genomes. Among other uses, they can be used to detect host
+contamination in microbial metagenomes.
+
+Each file includes sketches at k=21, k=31, and k=51, at a scaled of
+1000, and is about 110 MB.
Also, would you like to add some of the more canonical organisms? Namely,
|
No, I just want to merge ;). I've added these to #2717 so we don't lose the suggestion. But the pipeline I'm using doesn't do anything fancy and I don't want to piecemeal add new genomes (because it's work, and it's also error prone ;) until we automated the process more. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Adds common hosts and also hg38.
Tackles #2717
Rendered preview: