-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add kraken2 phylogenetic assignment subworkflow #47
base: dev
Are you sure you want to change the base?
Conversation
|
Warning Newer version of the nf-core template is available. Your pipeline is using an old version of the nf-core template: 2.14.1. For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation. |
75a8e33
to
3e0d357
Compare
Co-authored-by: Natalia Garcia Garcia <[email protected]>
I think, you have this here well covered. Therefore, I just wanted to point out, that a similar functionality was recently added to the rnaseq pipeline. So in case you still need some inspiration or have some more ugh moments (fancy commit messages^^), you might already find a suitable solution over there. |
Thank you! I took inspiration from the taxprofiler pipeline and tried simplifying it. I'll check how the |
… into feature/kraken2
… into feature/kraken2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! I have made some suggestions that maybe can be interesting, but feel free to apply them or not
@@ -22,6 +22,31 @@ process { | |||
ext.args = '--quiet' | |||
} | |||
|
|||
withName: 'KRAKEN2_KRAKEN2' { | |||
publishDir = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a general statement for this in this (file)[https://github.com/nf-core/seqinspector/blob/31c1f829d97c4b98d21b68beed4af050fd331a37/conf/modules.config#L15], so I don't think is needed to add it twice, except if that bit is going to be removed it later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is! I just added it here for two reasons: the first is that I wanted more descriptive names for the folders (kraken2_reports
instead of just kraken2
) and I wanted the krona plots to be inside the kraken2_reports
folder, with a more descriptive name as well.
The second reason I have added this seemingly redundant code is that kraken2 and kronatools can produce more output than what is produced now. I left these lines here looking into the future: they might need to be modified depending on the needs of the pipeline once it reaches a more stable status.
} | ||
|
||
withName: 'KRONA_KTIMPORTTAXONOMY' { | ||
publishDir = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as before
} | ||
|
||
withName: 'UNTAR' { | ||
publishDir = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure you want to output the kraken db, because it's size can be huge (depending on the selected one) and also it has been previously downloaded by the user, so already in user's device? You may want to use the storeDir in case you want to store the db and reuse it for later without the need of publishing it in the output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean! I could create a patch to the UNTAR module to add the storeDir directive, that would also need some changes to the config but it can be done.
In any case, to avoid unnecessary waste of space by saving the uncompressed database, the pipeline works differently if the user provides a gzipped database or an uncompressed one. If the pipeline is gzipped, the UNTAR
module uncompresses it and uses it, but by default, it won't save the uncompressed database if the user provided a compressed database.
The outputting of the uncompressed kraken2 db is turned off by default by the params.save_uncompressed_k2db
, which is set as false
. On the modules.config
file this is read by the enable
declaration.
If the database is uncompressed, and the user passes a path to the kraken2_db
param, the UNTAR
module is not called; the database is simply used and remains in the user's original directory.
nextflow.config
Outdated
@@ -19,6 +19,12 @@ params { | |||
igenomes_base = 's3://ngi-igenomes/igenomes/' | |||
igenomes_ignore = false | |||
|
|||
// Kraken2 options | |||
kraken2_db = 'https://github.com/nf-core/test-datasets/raw/taxprofiler/data/database/kraken2/testdb-kraken2.tar.gz' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which db is? there are different ones: https://benlangmead.github.io/aws-indexes/k2 which requires different resources depending on the size. The one used, even if its for test should be documented somewhere (maybe it's and I haven't seen it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used the minimalest possible database for testing purposes, but I agree with you that it should not be default one, it should just be set to null
. I used the taxprofiler
test one, which was built like this: https://github.com/nf-core/test-datasets/blob/taxprofiler/README.md#kraken2
Removed debugging commented statement Co-authored-by: Natalia Garcia Garcia <[email protected]>
… into feature/kraken2
Added a subworkflow for a "phylogenetic QC" that does kraken2 assignment for each sample and then plots them on an interactive krona plot. I have not added the kraken2 reports to multiqc, this should be the next step.
PR checklist
nf-core lint
).nf-test test main.nf.test -profile test,docker
).nextflow run . -profile debug,test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).