Add start from bam #193

Lucpen · 2025-01-04T12:38:08Z

PR checklist

github-actions · 2025-01-04T12:40:12Z

`nf-core pipelines lint` overall result: Passed ✅ ⚠️

Posted for pipeline commit 94eeb9d

+| ✅ 182 tests passed       |+
#| ❔  26 tests were ignored |#
!| ❗   8 tests had warnings |!

❗ Test warnings:

files_exist - File not found: .github/workflows/awstest.yml
files_exist - File not found: .github/workflows/awsfulltest.yml
nextflow_config - Config manifest.version should end in dev: 3.0.0
pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
pipeline_todos - TODO string in base.config: Check the defaults for all processes
pipeline_todos - TODO string in base.config: Customise requirements for specific processes.
pipeline_todos - TODO string in test_full.config: Specify the paths to your full test data ( on nf-core/test-datasets or directly in repositories, e.g. SRA)
pipeline_todos - TODO string in test_full.config: Give any required params for the test so that command line flags are not needed

❔ Tests ignored:

files_exist - File is ignored: assets/nf-core-tomte_logo_light.png
files_exist - File is ignored: docs/images/nf-core-tomte_logo_light.png
files_exist - File is ignored: docs/images/nf-core-tomte_logo_dark.png
files_exist - File is ignored: docs/images/tomte_logo.eps
files_exist - File is ignored: docs/images/tomte_pipeline_metromap.svg
files_exist - File is ignored: docs/images/tomte_pipeline_metromap.png
files_exist - File is ignored: conf/modules.config
files_exist - File is ignored: .github/ISSUE_TEMPLATE/config.yml
nextflow_config - Config variable ignored: manifest.name
nextflow_config - Config variable ignored: manifest.homePage
files_unchanged - File ignored due to lint config: .github/CONTRIBUTING.md
files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/bug_report.yml
files_unchanged - File does not exist: .github/ISSUE_TEMPLATE/config.yml
files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/feature_request.yml
files_unchanged - File ignored due to lint config: .github/PULL_REQUEST_TEMPLATE.md
files_unchanged - File ignored due to lint config: .github/workflows/branch.yml
files_unchanged - File ignored due to lint config: .github/workflows/linting.yml
files_unchanged - File ignored due to lint config: assets/email_template.html
files_unchanged - File ignored due to lint config: assets/email_template.txt
files_unchanged - File ignored due to lint config: assets/sendmail_template.txt
files_unchanged - File does not exist: assets/nf-core-tomte_logo_light.png
files_unchanged - File does not exist: docs/images/nf-core-tomte_logo_light.png
files_unchanged - File does not exist: docs/images/nf-core-tomte_logo_dark.png
files_unchanged - File ignored due to lint config: docs/README.md
actions_awstest - 'awstest.yml' workflow not found: /home/runner/work/tomte/tomte/.github/workflows/awstest.yml
modules_config - modules_config

✅ Tests passed:

files_exist - File found: .gitattributes
files_exist - File found: .gitignore
files_exist - File found: .nf-core.yml
files_exist - File found: .editorconfig
files_exist - File found: .prettierignore
files_exist - File found: .prettierrc.yml
files_exist - File found: CHANGELOG.md
files_exist - File found: CITATIONS.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_exist - File found: nextflow_schema.json
files_exist - File found: nextflow.config
files_exist - File found: README.md
files_exist - File found: .github/.dockstore.yml
files_exist - File found: .github/CONTRIBUTING.md
files_exist - File found: .github/ISSUE_TEMPLATE/bug_report.yml
files_exist - File found: .github/ISSUE_TEMPLATE/feature_request.yml
files_exist - File found: .github/PULL_REQUEST_TEMPLATE.md
files_exist - File found: .github/workflows/branch.yml
files_exist - File found: .github/workflows/ci.yml
files_exist - File found: .github/workflows/linting_comment.yml
files_exist - File found: .github/workflows/linting.yml
files_exist - File found: assets/email_template.html
files_exist - File found: assets/email_template.txt
files_exist - File found: assets/sendmail_template.txt
files_exist - File found: conf/test.config
files_exist - File found: conf/test_full.config
files_exist - File found: docs/output.md
files_exist - File found: docs/README.md
files_exist - File found: docs/README.md
files_exist - File found: docs/usage.md
files_exist - File found: main.nf
files_exist - File found: assets/multiqc_config.yml
files_exist - File found: conf/base.config
files_exist - File found: conf/igenomes.config
files_exist - File found: conf/igenomes_ignored.config
files_exist - File found: modules.json
files_exist - File not found check: .github/ISSUE_TEMPLATE/bug_report.md
files_exist - File not found check: .github/ISSUE_TEMPLATE/feature_request.md
files_exist - File not found check: .github/workflows/push_dockerhub.yml
files_exist - File not found check: .markdownlint.yml
files_exist - File not found check: .nf-core.yaml
files_exist - File not found check: .yamllint.yml
files_exist - File not found check: bin/markdown_to_html.r
files_exist - File not found check: conf/aws.config
files_exist - File not found check: docs/images/nf-core-tomte_logo.png
files_exist - File not found check: lib/Checks.groovy
files_exist - File not found check: lib/Completion.groovy
files_exist - File not found check: lib/NfcoreTemplate.groovy
files_exist - File not found check: lib/Utils.groovy
files_exist - File not found check: lib/Workflow.groovy
files_exist - File not found check: lib/WorkflowMain.groovy
files_exist - File not found check: lib/WorkflowTomte.groovy
files_exist - File not found check: parameters.settings.json
files_exist - File not found check: pipeline_template.yml
files_exist - File not found check: Singularity
files_exist - File not found check: lib/nfcore_external_java_deps.jar
files_exist - File not found check: .travis.yml
nextflow_config - Found nf-schema plugin
nextflow_config - Config variable found: manifest.nextflowVersion
nextflow_config - Config variable found: manifest.description
nextflow_config - Config variable found: manifest.version
nextflow_config - Config variable found: timeline.enabled
nextflow_config - Config variable found: trace.enabled
nextflow_config - Config variable found: report.enabled
nextflow_config - Config variable found: dag.enabled
nextflow_config - Config variable found: process.cpus
nextflow_config - Config variable found: process.memory
nextflow_config - Config variable found: process.time
nextflow_config - Config variable found: params.outdir
nextflow_config - Config variable found: params.input
nextflow_config - Config variable found: validation.help.enabled
nextflow_config - Config variable found: manifest.mainScript
nextflow_config - Config variable found: timeline.file
nextflow_config - Config variable found: trace.file
nextflow_config - Config variable found: report.file
nextflow_config - Config variable found: dag.file
nextflow_config - Config variable found: validation.help.beforeText
nextflow_config - Config variable found: validation.help.afterText
nextflow_config - Config variable found: validation.help.command
nextflow_config - Config variable found: validation.summary.beforeText
nextflow_config - Config variable found: validation.summary.afterText
nextflow_config - Config variable (correctly) not found: params.nf_required_version
nextflow_config - Config variable (correctly) not found: params.container
nextflow_config - Config variable (correctly) not found: params.singleEnd
nextflow_config - Config variable (correctly) not found: params.igenomesIgnore
nextflow_config - Config variable (correctly) not found: params.name
nextflow_config - Config variable (correctly) not found: params.enable_conda
nextflow_config - Config variable (correctly) not found: params.max_cpus
nextflow_config - Config variable (correctly) not found: params.max_memory
nextflow_config - Config variable (correctly) not found: params.max_time
nextflow_config - Config variable (correctly) not found: params.validationFailUnrecognisedParams
nextflow_config - Config variable (correctly) not found: params.validationLenientMode
nextflow_config - Config variable (correctly) not found: params.validationSchemaIgnoreParams
nextflow_config - Config variable (correctly) not found: params.validationShowHiddenParams
nextflow_config - Config timeline.enabled had correct value: true
nextflow_config - Config report.enabled had correct value: true
nextflow_config - Config trace.enabled had correct value: true
nextflow_config - Config dag.enabled had correct value: true
nextflow_config - Config dag.file ended with .html
nextflow_config - Config variable manifest.nextflowVersion started with >= or !>=
nextflow_config - Config params.custom_config_version is set to master
nextflow_config - Config params.custom_config_base is set to https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Lines for loading custom profiles found
nextflow_config - nextflow.config contains configuration profile test
nextflow_config - Config default value correct: params.outdir= results
nextflow_config - Config default value correct: params.save_mapped_as_cram= true
nextflow_config - Config default value correct: params.genome= GRCh38
nextflow_config - Config default value correct: params.gencode_annotation_version= 46
nextflow_config - Config default value correct: params.igenomes_base= s3://ngi-igenomes/igenomes/
nextflow_config - Config default value correct: params.platform= illumina
nextflow_config - Config default value correct: params.save_reference= true
nextflow_config - Config default value correct: params.vep_cache_version= 112
nextflow_config - Config default value correct: params.skip_download_vep= true
nextflow_config - Config default value correct: params.skip_download_gnomad= true
nextflow_config - Config default value correct: params.min_trimmed_length= 40
nextflow_config - Config default value correct: params.star_two_pass_mode= Basic
nextflow_config - Config default value correct: params.skip_subsample_region= false
nextflow_config - Config default value correct: params.skip_downsample= false
nextflow_config - Config default value correct: params.seed_frac= 0.001
nextflow_config - Config default value correct: params.num_reads= 120000000
nextflow_config - Config default value correct: params.variant_caller= bcftools
nextflow_config - Config default value correct: params.bcftools_caller_mode= multiallelic
nextflow_config - Config default value correct: params.skip_variant_calling= false
nextflow_config - Config default value correct: params.skip_build_tracks= false
nextflow_config - Config default value correct: params.skip_stringtie= false
nextflow_config - Config default value correct: params.skip_vep= false
nextflow_config - Config default value correct: params.skip_drop_ae= false
nextflow_config - Config default value correct: params.skip_drop_as= false
nextflow_config - Config default value correct: params.skip_export_counts_drop= true
nextflow_config - Config default value correct: params.drop_group_samples_ae= outrider
nextflow_config - Config default value correct: params.drop_group_samples_as= fraser
nextflow_config - Config default value correct: params.drop_padjcutoff_ae= 0.05
nextflow_config - Config default value correct: params.drop_padjcutoff_as= 0.1
nextflow_config - Config default value correct: params.drop_zscorecutoff= 0.0
nextflow_config - Config default value correct: params.skip_peddy= false
nextflow_config - Config default value correct: params.skip_calculate_hb_frac= false
nextflow_config - Config default value correct: params.custom_config_version= master
nextflow_config - Config default value correct: params.custom_config_base= https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Config default value correct: params.publish_dir_mode= copy
nextflow_config - Config default value correct: params.max_multiqc_email_size= 25.MB
nextflow_config - Config default value correct: params.validate_params= true
nextflow_config - Config default value correct: params.pipelines_testdata_base_path= https://raw.githubusercontent.com/nf-core/test-datasets/raredisease
files_unchanged - .gitattributes matches the template
files_unchanged - .prettierrc.yml matches the template
files_unchanged - LICENSE matches the template
files_unchanged - .github/.dockstore.yml matches the template
files_unchanged - .github/workflows/linting_comment.yml matches the template
files_unchanged - .gitignore matches the template
files_unchanged - .prettierignore matches the template
actions_ci - '.github/workflows/ci.yml' is triggered on expected events
actions_ci - '.github/workflows/ci.yml' checks minimum NF version
readme - README Nextflow minimum version badge matched config. Badge: 24.04.2, Config: 24.04.2
readme - README Zenodo placeholder was replaced with DOI.
plugin_includes - No wrong validation plugin imports have been found
pipeline_name_conventions - Name adheres to nf-core convention
template_strings - Did not find any Jinja template strings (0 files)
schema_lint - Schema lint passed
schema_lint - Schema title + description lint passed
schema_lint - Input mimetype lint passed: 'text/csv'
schema_params - Schema matched params returned from nextflow config
system_exit - No System.exit calls found
actions_schema_validation - Workflow validation passed: linting_comment.yml
actions_schema_validation - Workflow validation passed: clean-up.yml
actions_schema_validation - Workflow validation passed: linting.yml
actions_schema_validation - Workflow validation passed: download_pipeline.yml
actions_schema_validation - Workflow validation passed: fix-linting.yml
actions_schema_validation - Workflow validation passed: branch.yml
actions_schema_validation - Workflow validation passed: template_version_comment.yml
actions_schema_validation - Workflow validation passed: ci.yml
merge_markers - No merge markers found in pipeline files
modules_json - Only installed modules found in modules.json
multiqc_config - assets/multiqc_config.yml found and not ignored.
multiqc_config - assets/multiqc_config.yml contains report_section_order
multiqc_config - assets/multiqc_config.yml contains export_plots
multiqc_config - assets/multiqc_config.yml contains report_comment
multiqc_config - assets/multiqc_config.yml follows the ordering scheme of the minimally required plugins.
multiqc_config - assets/multiqc_config.yml contains 'export_plots: true'.
modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'
base_config - conf/base.config found and not ignored.
nfcore_yml - Repository type in .nf-core.yml is valid: pipeline
nfcore_yml - nf-core version in .nf-core.yml is set to the latest version: 3.0.2

Run details

nf-core/tools version 3.0.2
Run at 2025-01-09 09:09:34

fellen31

Nice, starting from BAM sounds like an improvement! Do you think adding a BAM start as a CI test is necessary?

fellen31 · 2025-01-09T10:03:28Z

docs/output.md

@@ -58,6 +58,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
 #### Salmon

 [`Salmon`](https://salmon.readthedocs.io/en/latest/) quantifies reads.
+Note that as Salmon has been setup to start from fastq files, it will not run if the pipeline starts from bam files.


I don't know Salmon very well. Would it make sense to convert bam to fastq in order to run Salmon when starting from BAM?

Yes, it would, however, I think that this should not be done in this PR, perhaps on the next one.

fellen31 · 2025-01-09T10:05:07Z

modules/local/drop_sample_annot.nf

-    def id = "${ids}".replace("[","").replace("]","").replace(",","")
-    def single_end = "${single_ends}".replace("[","").replace("]","").replace(",","")
-    def sex_drop = "${sex}".replace("[","").replace("]","").replace(",","").replace("1","M").replace("2","F").replace("0","NA").replace("other","NA")
-    def strandedness = "${strandednesses}".replace("[","").replace("]","").replace(",","")
+    def id = ids.join(' ')
+    def single_end = single_ends.join(' ')
+    def sex_drop = sex.collect { it.replace("1","M").replace("2","F").replace("0","NA").replace("other","NA") }.join(' ')
+    def strandedness = strandednesses.join(' ')


Why was this changed?

It is more concise, just to prettify

Okay, so the .replace(...) are no longer necessary?

fellen31 · 2025-01-09T10:13:20Z

modules/local/drop_sample_annot.nf

+    SINGLE_ENDS=(${single_end})
+    BAMS=(${bam.join(' ')})
+
+    # Check if single_end values are provided
+    updated_single_ends=()
+    for ((i=0; i<\${#SINGLE_ENDS[@]}; i++)); do
+        if [[ "\${SINGLE_ENDS[i]}" == "null" ]]; then
+            result=\$(samtools view -c -f 1 "\${BAMS[i]}" | awk '{print \$1 == 0 ? "true" : "false"}')
+            updated_single_ends+=("\$result")
+        else
+            updated_single_ends+=("\${SINGLE_ENDS[i]}")
+        fi
+    done
+
+    # Convert updated_single_ends array to space-separated string and save to file
+    echo "\${updated_single_ends[*]}" > updated_single_ends.txt
+


It's hard for me to understand what's happening here. Are you checking if there are any paired ends in the BAM-file and then printing a false or true? Is this information not already in the meta/single_end input to this process?

It will be there if one starts from fastq, but it won't if you start from bam, that's why I added it. I am very open to any suggestion on how to make it more readable, because I totally agree 😄

fellen31 · 2025-01-09T10:13:49Z

subworkflows/local/alignment.nf

+        ch_fastq_reads        // channel:   [optional] [ val(meta), [path(reads)]  ]
+        ch_bam_bai_reads      // channel:   [optional] [ val(meta), [path(bam) path(bai)]  ]


Suggested change

ch_fastq_reads // channel: [optional] [ val(meta), [path(reads)] ]

ch_bam_bai_reads // channel: [optional] [ val(meta), [path(bam) path(bai)] ]

ch_fastq_reads // channel: [optional] [ val(meta), [path(reads)] ]

ch_bam_bai_reads // channel: [optional] [ val(meta), [path(bam) path(bai)] ]

fellen31 · 2025-01-09T12:09:39Z

subworkflows/local/alignment.nf

+        ch_bam_reads = ch_bam_bai_reads.map { meta, bambai -> [ meta, bambai[0] ] }
+        ch_bai_reads = ch_bam_bai_reads.map { meta, bambai -> [ meta, bambai[1] ] }
+
+        ch_bam_aligned=ch_bam_reads.mix(STAR_ALIGN.out.bam_sorted_aligned)


Suggested change

ch_bam_aligned=ch_bam_reads.mix(STAR_ALIGN.out.bam_sorted_aligned)

ch_bam_aligned = ch_bam_reads.mix(STAR_ALIGN.out.bam_sorted_aligned)

There can now also be aligned reads in ch_bam_reads right? ch_bam_aligned and ch_bai are "equivalents"?

I think I follow, but perhaps there could be more expressive names. It's a bit hard for me to see the difference between ch_bam_aligned + ch_bai, ch_bam_bai and ch_bam_bai_out together with the different ifs.

fellen31 · 2025-01-09T12:26:54Z

subworkflows/local/utils_nfcore_tomte_pipeline/main.nf

+
+ch_samplesheet
+    .map { meta, files ->
+        [meta.sample, groupKey(meta + [id: meta.sample], meta.fq_pairs ?: 1), files]


How does meta.fq_pairs ?: 1 work for the bam branch?

fellen31 · 2025-01-09T12:29:00Z

subworkflows/local/utils_nfcore_tomte_pipeline/main.nf

-    versions    = ch_versions
+emit:
+samplesheet = ch_samplesheet
+versions    = ch_versions


Indent everything (from line 75 to here)?

fellen31 · 2025-01-09T12:30:08Z

subworkflows/local/utils_nfcore_tomte_pipeline/main.nf

+        def expectedNumber = meta[0].single_end ? 1 : 2
+        def sampleNumber = files.flatten().size()
+        if (expectedNumber != sampleNumber) {
+            error("Samplesheet contains incorrect number of fastq files for sample ${sample}. Expected ${expectedNumber}, got ${sampleNumber}.")


fellen31 · 2025-01-09T12:32:13Z

docs/usage.md

+| `bam`          | Full path to BAM file.                                                                                                                                                                 | Provide either fastq_1 or bam |
+| `bai`          | Full path to BAM index file.                                                                                                                                                           | Provide either fastq_2 or bai |


Do the descriptions match here? "Provide either fastq_2 or bai"

That part is whether the file is mandatory or not, its the third column

I thought it was a copy paste error. Should bai not always be mandatory when you have bam then, or will it work without bai?

fellen31 · 2025-01-09T12:37:08Z

docs/usage.md

@@ -98,17 +98,19 @@ Running the pipeline involves three steps:

 #### Samplesheet

-A samplesheet is used to pass the information about the sample(s), such as the path to the FASTQ files and other meta data (sex, phenotype, etc.,) to the pipeline in csv format.
+A samplesheet is used to pass the information about the sample(s), such as the path to the FASTQ/BAM files and other meta data (sex, phenotype, etc.,) to the pipeline in csv format.


In long read unaligned reads are often stored as BAM files rather than fastq. Is it necessary to specify aligned reads for BAM, or would it be obvious to the user that BAM equals aligned reads?

I am unsure to be honest, I even wonder if it would work uBAM

I guess for most people BAM = aligned reads, so I think you can ignore my question.

Lucpen added 4 commits December 30, 2024 13:08

fixed input as bam

eecb5df

fixed channel fastq

7521404

modules/local/drop_sample_annot.nf

ca10c2c

bam taken as input

172851f

Lucpen added 5 commits January 5, 2025 12:18

add single_end when bam file provided

4ce8335

run prettier remove trailing whitespace

b654a9d

fix whitespave

efc9857

add documentation

2c67d3f

prettier

94eeb9d

Lucpen marked this pull request as ready for review January 9, 2025 09:11

Lucpen requested a review from a team as a code owner January 9, 2025 09:11

Lucpen linked an issue Jan 9, 2025 that may be closed by this pull request

Enable bam input #59

Open

Lucpen added enhancement New feature or request Ready for review Ready for review labels Jan 9, 2025

fellen31 self-requested a review January 9, 2025 12:21

fellen31 reviewed Jan 9, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add start from bam #193

Add start from bam #193

Lucpen commented Jan 4, 2025

github-actions bot commented Jan 4, 2025 •

edited

Loading

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

fellen31 left a comment

fellen31 Jan 9, 2025

Lucpen Jan 9, 2025

fellen31 Jan 9, 2025

Lucpen Jan 9, 2025

fellen31 Jan 9, 2025

fellen31 Jan 9, 2025

Lucpen Jan 9, 2025

fellen31 Jan 9, 2025

fellen31 Jan 9, 2025

fellen31 Jan 9, 2025

fellen31 Jan 9, 2025

fellen31 Jan 9, 2025

fellen31 Jan 9, 2025

fellen31 Jan 9, 2025

Lucpen Jan 9, 2025

fellen31 Jan 9, 2025

fellen31 Jan 9, 2025

Lucpen Jan 9, 2025

fellen31 Jan 9, 2025

		ch_fastq_reads // channel: [optional] [ val(meta), [path(reads)] ]
		ch_bam_bai_reads // channel: [optional] [ val(meta), [path(bam) path(bai)] ]

	ch_bam_aligned=ch_bam_reads.mix(STAR_ALIGN.out.bam_sorted_aligned)
	ch_bam_aligned = ch_bam_reads.mix(STAR_ALIGN.out.bam_sorted_aligned)

		\| `bam` \| Full path to BAM file. \| Provide either fastq_1 or bam \|
		\| `bai` \| Full path to BAM index file. \| Provide either fastq_2 or bai \|

Add start from bam #193

Are you sure you want to change the base?

Add start from bam #193

Conversation

Lucpen commented Jan 4, 2025

PR checklist

github-actions bot commented Jan 4, 2025 • edited Loading

nf-core pipelines lint overall result: Passed ✅ ⚠️

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

fellen31 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Jan 4, 2025 •

edited

Loading

`nf-core pipelines lint` overall result: Passed ✅ ⚠️