Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 0.5 #104

Merged
merged 42 commits into from
Jul 31, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
0d4c6e7
Update sanger_test.yml - use large tower env
gq1 Feb 13, 2024
1bf0d25
Update sanger_test_full.yml - use large tower env
gq1 Feb 13, 2024
2eea140
Update sanger_test.yml - just add the push condition won't work
gq1 Feb 13, 2024
4992f1c
Update sanger_test_full.yml
gq1 Feb 13, 2024
c36fb21
Merge pull request #93 from sanger-tol/large_tower_env
muffato Feb 14, 2024
acbc472
Merge branch 'main' into dev
muffato Apr 17, 2024
305f87a
Skip the subworkflow if the nohit list is empty
muffato May 1, 2024
2737952
Fixed the documentation
muffato May 1, 2024
948e139
Increased the resources for blastn
muffato May 9, 2024
a25e4bb
Version bump
muffato May 13, 2024
162b5fa
Removed an unused parameter
muffato May 13, 2024
afb27d3
Removed trailing whitespace
muffato May 17, 2024
4a226cf
Channel.of(...).first() is the same as Channel.value(...)
muffato May 17, 2024
0b12ad4
The accession is not appropriate there
muffato May 17, 2024
043a120
Removed some parameters that could in fact not be changed
muffato May 17, 2024
5670deb
The accession number is in fact mutually exclusive of the yaml file
muffato May 17, 2024
083dbb8
This should be value channel
muffato May 19, 2024
9e36a7a
Improved the type and handling of the busco_db channel
muffato May 19, 2024
5c99602
Nicer name, even for compressed files
muffato May 20, 2024
b0e4e57
This module is not used
muffato May 20, 2024
9b1ecc3
typo
muffato May 21, 2024
6581abf
Merge pull request #98 from sanger-tol/clean_params
muffato May 23, 2024
b3c852c
update the default branch from master to main for PR checking
gq1 Jun 5, 2024
01c7d66
Add one more unchanged file for nf-core linting
gq1 Jun 5, 2024
d16a4df
Merge pull request #100 from sanger-tol/ci_branch
gq1 Jun 5, 2024
2ae99f6
bugfix: this should be a megablast
muffato Jun 10, 2024
cfac699
diamond blast should run with the --taxon-exclude option
muffato Jun 24, 2024
c5da99d
Publish peripheral data as well, even if we don't use it ourselves
muffato Jun 4, 2024
e4b63a8
Updated the documentation
muffato Jun 4, 2024
075dce3
`true` is not needed
muffato Jun 5, 2024
aef0532
Switched to the default btk biocontainers because the pacbio one is gone
muffato Jul 1, 2024
7547ead
Merge pull request #99 from sanger-tol/more_published_files
muffato Jul 1, 2024
db3e317
Ignore the missing taxon_id errors issues when -negative_taxids is set
muffato Jun 24, 2024
8cfe4e5
CHANGELOG
muffato Jul 1, 2024
93260b1
blastn needs a bit more memory
muffato Jun 10, 2024
310a812
Adjusted the resources (incl. runtime) of blastn
muffato Jul 1, 2024
b83f05c
There are blast failures we don't know how to fix. Just ignore for now
muffato Jul 15, 2024
5420de2
Merge pull request #103 from sanger-tol/more_bugfixes
muffato Jul 18, 2024
0809ce0
Update input_check.nf
BethYates Jul 30, 2024
a8e2bba
Update CHANGELOG.md
BethYates Jul 30, 2024
50e5431
Merge pull request #107 from sanger-tol/fix_splitCsv_in_fetch_input.nf
BethYates Jul 31, 2024
544c135
Update the release date
muffato Jul 31, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 8 additions & 8 deletions .github/workflows/branch.yml
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
name: nf-core branch protection
# This workflow is triggered on PRs to master branch on the repository
# It fails when someone tries to make a PR against the nf-core `master` branch instead of `dev`
# This workflow is triggered on PRs to main branch on the repository
# It fails when someone tries to make a PR against the nf-core `main` branch instead of `dev`
on:
pull_request_target:
branches: [master]
branches: [main]

jobs:
test:
runs-on: ubuntu-latest
steps:
# PRs to the nf-core repo master branch are only ok if coming from the nf-core repo `dev` or any `patch` branches
# PRs to the nf-core repo main branch are only ok if coming from the nf-core repo `dev` or any `patch` branches
- name: Check PRs
if: github.repository == 'sanger-tol/blobtoolkit'
run: |
Expand All @@ -22,7 +22,7 @@ jobs:
uses: mshick/add-pr-comment@v1
with:
message: |
## This PR is against the `master` branch :x:
## This PR is against the `main` branch :x:

* Do not close this PR
* Click _Edit_ and change the `base` to `dev`
Expand All @@ -32,9 +32,9 @@ jobs:

Hi @${{ github.event.pull_request.user.login }},

It looks like this pull-request is has been made against the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `master` branch.
The `master` branch on nf-core repositories should always contain code from the latest release.
Because of this, PRs to `master` are only allowed if they come from the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `dev` branch.
It looks like this pull-request is has been made against the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `main` branch.
The `main` branch on nf-core repositories should always contain code from the latest release.
Because of this, PRs to `main` are only allowed if they come from the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `dev` branch.

You do not need to close this PR, you can change the target branch to `dev` by clicking the _"Edit"_ button at the top of this page.
Note that even after this, the test will continue to show as failing until you push a new commit.
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/sanger_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ jobs:
with:
workspace_id: ${{ secrets.TOWER_WORKSPACE_ID }}
access_token: ${{ secrets.TOWER_ACCESS_TOKEN }}
compute_env: ${{ secrets.TOWER_COMPUTE_ENV }}
compute_env: ${{ secrets.TOWER_COMPUTE_ENV_LARGE }}
revision: ${{ env.REVISION }}
workdir: ${{ secrets.TOWER_WORKDIR_PARENT }}/work/${{ github.repository }}/work-${{ env.REVISION }}
parameters: |
Expand Down
6 changes: 5 additions & 1 deletion .github/workflows/sanger_test_full.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
name: sanger-tol LSF full size tests

on:
push:
branches:
- main
- dev
workflow_dispatch:
jobs:
run-tower:
Expand All @@ -22,7 +26,7 @@ jobs:
with:
workspace_id: ${{ secrets.TOWER_WORKSPACE_ID }}
access_token: ${{ secrets.TOWER_ACCESS_TOKEN }}
compute_env: ${{ secrets.TOWER_COMPUTE_ENV }}
compute_env: ${{ secrets.TOWER_COMPUTE_ENV_LARGE }}
revision: ${{ env.REVISION }}
workdir: ${{ secrets.TOWER_WORKDIR_PARENT }}/work/${{ github.repository }}/work-${{ env.REVISION }}
parameters: |
Expand Down
1 change: 1 addition & 0 deletions .nf-core.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ lint:
- .github/ISSUE_TEMPLATE/bug_report.yml
- .github/PULL_REQUEST_TEMPLATE.md
- .github/workflows/linting.yml
- .github/workflows/branch.yml
multiqc_config:
- report_comment
nextflow_config:
Expand Down
24 changes: 24 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,30 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [[0.5.0](https://github.com/sanger-tol/blobtoolkit/releases/tag/0.5.0)] – Snorlax – [2024-07-31]

General tidy up of the configuration and the pipeline

### Enhancements & fixes

- Increased the resources for blastn
- Removed some options that were not used or not needed
- All relevant outputs are now copied to the output directory
- Fixed some blast parameters to match the behaviour of the Snakemake pipeline
- Fixed parsing of samplesheets from fetchngs to capture correct data type

### Parameters

| Old parameter | New parameter |
| --------------- | ------------- |
| --taxa_file | |
| --blastp_outext | |
| --blastp_cols | |
| --blastx_outext | |
| --blastx_cols | |

> **NB:** Parameter has been **updated** if both old and new parameter information is present. </br> **NB:** Parameter has been **added** if just the new parameter information is present. </br> **NB:** Parameter has been **removed** if new parameter information isn't present.

## [[0.4.0](https://github.com/sanger-tol/blobtoolkit/releases/tag/0.4.0)] – Buneary – [2024-04-17]

The pipeline has now been validated on dozens of genomes, up to 11 Gbp.
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@ It takes a samplesheet of BAM/CRAM/FASTQ/FASTA files as input, calculates genome
4. Run BUSCO ([`busco`](https://busco.ezlab.org/))
5. Extract BUSCO genes ([`blobtoolkit/extractbuscos`](https://github.com/blobtoolkit/blobtoolkit))
6. Run Diamond BLASTp against extracted BUSCO genes ([`diamond/blastp`](https://github.com/bbuchfink/diamond))
7. Run BLASTn against extracted BUSCO genes ([`blast/blastn`](https://www.ncbi.nlm.nih.gov/books/NBK131777/))
8. Run BLASTx against extracted BUSCO genes ([`blast/blastx`](https://www.ncbi.nlm.nih.gov/books/NBK131777/))
7. Run BLASTx against sequences with no hit ([`blast/blastn`](https://www.ncbi.nlm.nih.gov/books/NBK131777/))
8. Run BLASTn against sequences still with not hit ([`blast/blastx`](https://www.ncbi.nlm.nih.gov/books/NBK131777/))
9. Count BUSCO genes ([`blobtoolkit/countbuscos`](https://github.com/blobtoolkit/blobtoolkit))
10. Generate combined sequence stats across various window sizes ([`blobtoolkit/windowstats`](https://github.com/blobtoolkit/blobtoolkit))
11. Imports analysis results into a BlobDir dataset ([`blobtoolkit/blobdir`](https://github.com/blobtoolkit/blobtoolkit))
Expand Down
12 changes: 12 additions & 0 deletions conf/base.config
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,18 @@ process {
time = { check_max( 3.h * Math.ceil(meta.genome_size/1000000000) * task.attempt, 'time') }
}

withName: "BLAST_BLASTN" {

// There are blast failures we don't know how to fix. Just ignore for now
errorStrategy = { task.exitStatus in ((130..145) + 104) ? (task.attempt == process.maxRetries ? 'ignore' : 'retry') : 'finish' }

// Most jobs complete quickly but some need a lot longer. For those outliers,
// the CPU usage remains usually low, often nearing a single CPU
cpus = { check_max( 6 - (task.attempt-1), 'cpus' ) }
memory = { check_max( 1.GB * Math.pow(4, task.attempt-1), 'memory' ) }
time = { check_max( 10.h * Math.pow(4, task.attempt-1), 'time' ) }
}

withName:CUSTOM_DUMPSOFTWAREVERSIONS {
cache = false
}
Expand Down
26 changes: 25 additions & 1 deletion conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,14 @@ process {
ext.args = { "-ax map-ont -I" + Math.ceil(meta2.genome_size/1e9) + 'G' }
}

withName: "MINIMAP2_.*" {
publishDir = [
path: { "${params.outdir}/read_mapping/${meta.datatype}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals("versions.yml") ? null : filename }
]
}

withName: "SAMTOOLS_VIEW" {
ext.args = "--output-fmt bam --write-index"
}
Expand All @@ -60,6 +68,22 @@ process {
ext.args = "--lineage --busco"
}

withName: "PIGZ_COMPRESS" {
publishDir = [
path: { "${params.outdir}/base_content" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals("versions.yml") ? null : filename.minus("fw_out/") }
]
}

withName: "BLOBTK_DEPTH" {
publishDir = [
path: { "${params.outdir}/read_mapping/${meta.datatype}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals("versions.yml") ? null : "${meta.id}.coverage.1k.bed.gz" }
]
}

withName: "BUSCO" {
scratch = true
ext.args = { 'test' in workflow.profile.tokenize(',') ?
Expand Down Expand Up @@ -114,7 +138,7 @@ process {
}

withName: "BLAST_BLASTN" {
ext.args = "-outfmt '6 qseqid staxids bitscore std' -max_target_seqs 10 -max_hsps 1 -evalue 1.0e-10 -lcase_masking -dust '20 64 1'"
ext.args = "-task megablast -outfmt '6 qseqid staxids bitscore std' -max_target_seqs 10 -max_hsps 1 -evalue 1.0e-10 -lcase_masking -dust '20 64 1'"
}

withName: "CUSTOM_DUMPSOFTWAREVERSIONS" {
Expand Down
63 changes: 55 additions & 8 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,9 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
- [BlobDir](#blobdir) - Output files viewable on a [BlobToolKit viewer](https://github.com/blobtoolkit/blobtoolkit)
- [Static plots](#static-plots) - Static versions of the BlobToolKit plots
- [BUSCO](#busco) - BUSCO results
- [Read alignments](#read-alignments) - Aligned reads (optional)
- [Read coverage](#read-coverage) - Read coverage tracks
- [Base content](#base-content) - _k_-mer statistics (for k &le; 4)
- [MultiQC](#multiqc) - Aggregate report describing results from the whole pipeline
- [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution

Expand All @@ -26,8 +29,8 @@ The files in the BlobDir dataset which is used to create the online interactive
<summary>Output files</summary>

- `blobtoolkit/`
- `<accession>/`
- `*.json.gz`: files generated from genome and alignment coverage statistics
- `<assembly-name>/`
- `*.json.gz`: files generated from genome and alignment coverage statistics.

More information about visualising the data in the [BlobToolKit repository](https://github.com/blobtoolkit/blobtoolkit/tree/main/src/viewer)

Expand All @@ -53,12 +56,56 @@ BUSCO results generated by the pipeline (all BUSCO lineages that match the claas
<details markdown="1">
<summary>Output files</summary>

- `blobtoolkit/`
- `busco/`
- `*.batch_summary.txt`: BUSCO scores as tab-separated files (1 file per lineage).
- `*.fasta.txt`: BUSCO scores as formatted text (1 file per lineage).
- `*.json`: BUSCO scores as JSON (1 file per lineage).
- `*/`: all output BUSCO files, including the coordinate and sequence files of the annotated genes.
- `busco/`
- `<lineage-name>/`
- `short_summary.json`: BUSCO scores for that lineage as a tab-separated file.
- `short_summary.tsv`: BUSCO scores for that lineage as JSON.
- `short_summary.txt`: BUSCO scores for that lineage as formatted text.
- `full_table.tsv`: Coordinates of the annotated BUSCO genes as a tab-separated file.
- `missing_busco_list.tsv`: List of the BUSCO genes that could not be found.
- `*_busco_sequences.tar.gz`: Sequences of the annotated BUSCO genes. 1 _tar_ archive for each of the three annotation levels (`single_copy`, `multi_copy`, `fragmented`), with 1 file per gene.
- `hmmer_output.tar.gz`: Archive of the HMMER alignment scores.

</details>

### Read alignments

Read alignments in BAM format -- only if the pipeline is run with `--align`.

<details markdown="1">
<summary>Output files</summary>

- `read_mapping/`
- `<datatype>/`
- `<sample>.bam`: alignments of that sample's reads in BAM format.

</details>

### Read coverage

Read coverage statistics as computed by the pipeline.
Those files are the raw data used to build the BlobDir.

<details markdown="1">
<summary>Output files</summary>

- `read_mapping/`
- `<datatype>/`
- `<sample>.coverage.1k.bed.gz`: Bedgraph file with the coverage of the alignments of that sample per 1 kbp windows.

</details>

### Base content

_k_-mer statistics.
Those files are the raw data used to build the BlobDir.

<details markdown="1">
<summary>Output files</summary>

- `base_content/`
- `<assembly-name>_*nuc_windows.tsv.gz`: Tab-separated files with the counts of every _k_-mer for k &le; 4 in 1 kbp windows. The first three columns correspond to the coordinates (sequence name, start, end), followed by each _k_-mer.
- `<assembly-name>_freq_windows.tsv.gz`: Tab-separated files with frequencies derived from the _k_-mer counts.

</details>

Expand Down
11 changes: 9 additions & 2 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -30,12 +30,14 @@
"diamond/blastp": {
"branch": "master",
"git_sha": "b29f6beb86d1d24d680277fb1a3f4de7b8b8a92c",
"installed_by": ["modules"]
"installed_by": ["modules"],
"patch": "modules/nf-core/diamond/blastp/diamond-blastp.diff"
},
"diamond/blastx": {
"branch": "master",
"git_sha": "b29f6beb86d1d24d680277fb1a3f4de7b8b8a92c",
"installed_by": ["modules"]
"installed_by": ["modules"],
"patch": "modules/nf-core/diamond/blastx/diamond-blastx.diff"
},
"fastawindows": {
"branch": "master",
Expand Down Expand Up @@ -64,6 +66,11 @@
"git_sha": "b7ebe95761cd389603f9cc0e0dc384c0f663815a",
"installed_by": ["modules"]
},
"pigz/compress": {
"branch": "master",
"git_sha": "0eab94fc1e48703c1b0a8704bd665f554905c39d",
"installed_by": ["modules"]
},
"samtools/fasta": {
"branch": "master",
"git_sha": "f4596fe0bdc096cf53ec4497e83defdb3a94ff62",
Expand Down
2 changes: 1 addition & 1 deletion modules/local/blobtoolkit/updatemeta.nf
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ process BLOBTOOLKIT_UPDATEMETA {
if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) {
exit 1, "BLOBTOOLKIT_UPDATEMETA module does not support Conda. Please use Docker / Singularity / Podman instead."
}
container "docker.io/pacificbiosciences/pyyaml:5.3.1"
container "docker.io/genomehubs/blobtoolkit:4.3.9"

input:
tuple val(meta), path(input)
Expand Down
25 changes: 21 additions & 4 deletions modules/nf-core/blast/blastn/blast-blastn.diff

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

10 changes: 8 additions & 2 deletions modules/nf-core/blast/blastn/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading