Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added ability to auto-create samplesheet for nf-core/phageannotator #543

Draft
wants to merge 22 commits into
base: dev
Choose a base branch
from
Draft
Changes from 1 commit
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 30 additions & 4 deletions subworkflows/local/generate_downstream_samplesheet.nf
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ workflow GENERATE_DOWNSTREAM_SAMPLESHEET {
downstream_nfcore_pipelines // val: [ nf-core-pipeline, OPTIONAL: other-nf-core-pipelines ]
short_reads // channel: [val(meta), path(fastq_1), path(fastq_2)]
assemblies // channel: [val(meta), path(fasta)]

main:

ch_versions = Channel.empty()
Expand Down Expand Up @@ -103,10 +104,35 @@ workflow GENERATE_DOWNSTREAM_SAMPLESHEET {
.set { ch_mag_metadata }
}

// Create samplesheet for each sample using meta information
ch_mag_id_samplesheets = ch_mag_metadata.collectFile() { meta ->
[ "${meta.id}_phageannotator_samplesheet.csv", "sample,group,fastq_1,fastq_2,fasta" + '\n' + "${meta.id},${meta.group},${meta.fastq_1},${meta.fastq_2},${meta.fasta}" + '\n' ]
}
// Create samplesheet for each sample using meta information
ch_mag_id_samplesheets = ch_mag_metadata.collectFile() { meta ->
// Save reads and assemblies to outdir so that they are in a stable location
file(meta.fastq_1.toUriString(), checkIfExists: true).copyTo("${params.outdir}/downstream_samplesheets/fastq/${meta.fastq_1.name}")
file(meta.fasta, checkIfExists: true).copyTo("${params.outdir}/downstream_samplesheets/fasta/${meta.fasta.name}")
Comment on lines +110 to +111
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asking for some advise about this, this is indeed very clean, but given it's outside a process/publishdir I'm a bit unsure. E.g. some people use symlinks from work to results dir rather than copy, so this would violate this.

Secondly this woudl potentially result in two copies of the same read files if one of the --save_* parameters are given.

It maybe we have to come up with some complicated logic that instead picks the right 'final' directory for the proceessing reads, and it's that that you append to the beginning, and then add.

I think this would come with two steps:

  1. Make sure one of --save_clipped_reads, --save_hostremoved_reads, --save_phixremoved_reads, --save_bbnorm_reads are selected if params.generate_samplesheet is true

    So some input validation code like:

    if ( params.generate_samplesheet && ![params.saved_clipped, save.hostremoved <...>].any() ) { Nextflow.error('[nf-core/mag] ERROR: must at least save one XYX if --generate_samplesheet <...>') }
    
  2. Then have to have some complicated logic to select which directory gets appended to the beginning of the file name (probably best in a case when but w/e), e.g.

    if (params.save_clipped_reads && !params.save_phix && !params.save_host_removed && !params._save_bbnorm) { 
    
samplehseet_reads_dir = /subdir/clipedreads/saved/in
   } else if ( params.save_phix && !params.save_host_removed && !params._save_bbnorm){
   samplesheet_reads_dir = /subdir/phixreads/saved/in
   }
 ```

Copy link
Contributor Author

@CarsonJM CarsonJM Mar 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally see what you're saying and was curious what thoughts would be about doing this outside a process. I am more than happy change it to what you're suggesting!

The reason I was giving this approach a try was I was hoping to find a generalizable approach that would work across pipelines/updates even if the save input logic you're talking about gets modified (also, I selfishly wanted to find a quicker approach haha) Also, I was hoping to force a copy because in the past I've symlinked to publishDir thinking I copied and had my reads disappear because my workDir is in a scratch directory 😅.

Still, I definitely have though about what you're saying and will make that change if you think it's best!

if ( !meta.single_end ){
file(meta.fastq_2.toUriString(), checkIfExists: true).copyTo("${params.outdir}/downstream_samplesheets/fastq/${meta.fastq_2.name}")
[ "${meta.id}_phageannotator_samplesheet.csv",
"sample,group,fastq_1,fastq_2,fasta" +
'\n' +
"${meta.id},${meta.group}," +
file("${params.outdir}/downstream_samplesheets/fastq/${meta.fastq_1.name}").toString() + "," +
file("${params.outdir}/downstream_samplesheets/fastq/${meta.fastq_2.name}").toString() + "," +
file("${params.outdir}/downstream_samplesheets/fasta/${meta.fasta.name}").toString() +
'\n'
]
} else {
// Create samplesheet for each sample using meta information
[ "${meta.id}_phageannotator_samplesheet.csv",
"sample,group,fastq_1,fastq_2,fasta" +
'\n' +
"${meta.id},${meta.group}," +
file("${params.outdir}/downstream_samplesheets/fastq/${meta.fastq_1.name}").toString() + "," +
"," +
file("${params.outdir}/downstream_samplesheets/fasta/${meta.fasta.name}").toString() +
'\n'
]
}
}

// Merge samplesheet across all samples for the pipeline
ch_mag_id_samplesheets.collectFile(name: "phageannotator_samplesheet.csv", keepHeader:true, skip:1, storeDir:"${params.outdir}/downstream_samplesheets/")
Expand Down
Loading