Skip to content

Commit

Permalink
updated minSupportPercent param description
Browse files Browse the repository at this point in the history
dportik committed Feb 8, 2022
1 parent cc7bb85 commit 678ae04
Showing 2 changed files with 7 additions and 5 deletions.
10 changes: 6 additions & 4 deletions Taxonomic-Functional-Profiling-Protein/config.yaml
Original file line number Diff line number Diff line change
@@ -54,8 +54,10 @@ sam2rma:
# possible values = readCount, readLength, alignedBases, readMagnitude

# Minimum support as percent of assigned reads. Default in MEGAN is 0.05, but with HiFi
# the best value is 0.01. This provides an optimal trade-off between precision and recall,
# with near perfect detection of species down to 0.1-0.02% abundance (based on mock community)
# data. To recover more species at lower abundances (which may be false positives), this
# can be changed to 0.001.
# the optimized value is 0.01. This provides a balanced trade-off between precision and recall
# (based on mock community datasets), with near perfect detection of species down to
# ~0.04% abundance. To avoid any filtering based on this threshold, use a value of 0
# instead. This will report ALL assigned reads, which will include potentially thousands
# of false positives at ultra-low abundances (<0.01%), similar to results from
# short-read methods (e.g., Kraken2, Centrifuge, etc).
minSupportPercent: 0.01
2 changes: 1 addition & 1 deletion docs/Tutorial-Taxonomic-Functional-Profiling-Protein.md
Original file line number Diff line number Diff line change
@@ -143,7 +143,7 @@ Depending on your system resources, you may choose to change the number of threa

The `hit_limit` argument allows you to specify the type of hit limit method and corresponding value. You can choose between the `--top` method or `-k` method, which are used with the range-culling mode (see [DIAMOND documentation](http://www.diamondsearch.org/index.php?pages/command_line_options/)). The default is `--top 5`, meaning a hit will only be deleted if its score is more than 5% lower than that of a higher scoring hit over at least 50% of its query range. Using `-k 5` instead means that a hit will only be deleted if at least 50% of its query range is spanned by at least 5 higher or equal scoring hits. In general, the `-k` method will keep far fewer hits, and specifying `-k 1` will keep a single hit per query range. This can be useful for 1) very simple metagenomic communities, or 2) reducing the output file size. If you choose to modify the `hit_limit` argument, you will want to supply the complete DIAMOND flag (e.g., `-k 3` or `--top 10`).

Finally, consider the `minSupportPercent` argument, which is the minimum support as percent of assigned reads required to report a taxon. The default in MEGAN is 0.05, but with HiFi the best value appears to be 0.01. This provides an optimal trade-off between precision and recall, with near perfect detection of species down to 0.1-0.02% abundance based on mock community datasets. To recover more species at lower abundances (which are likely to be false positives), this can be changed to 0.001, or even 0.00001. This will primarily increase the number of species found at ultra-low abundances (<0.01%), and provide results similar to short-read methods (e.g., Kraken2, Bracken, Centrifuge).
Finally, consider the `minSupportPercent` argument, which is the minimum support as percent of assigned reads required to report a taxon. The default in MEGAN is 0.05, but with HiFi the best value appears to be 0.01. This provides an optimal trade-off between precision and recall, with near perfect detection of species down to ~0.04% abundance. To avoid any filtering based on this threshold, use a value of 0 instead. This will report ALL assigned reads, which will potentially include thousands of false positives at ultra-low abundances (<0.01%), similar to results from short-read methods (e.g., Kraken2, Centrifuge, etc). Make sure you filter such files after the analysis to reduce false positives!

**You must also specify the full paths to `sam2rma`, the MEGAN mapping database file, and the indexed NCBI-nr database (`diamond_nr_db.dmnd`)**.

0 comments on commit 678ae04

Please sign in to comment.