Skip to content

Commit

Permalink
Merge pull request #117 from sanger-tol/misc_fixes
Browse files Browse the repository at this point in the history
Misc fixes before release
  • Loading branch information
tkchafin authored Oct 16, 2024
2 parents 102dbf4 + dfb4655 commit 93bda84
Show file tree
Hide file tree
Showing 13 changed files with 46 additions and 22 deletions.
10 changes: 10 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,16 @@ The pipeline is now considered to be a complete and suitable replacement for the
"grid plots".
- Fill in accurate read information in the blobDir. Users are now reqiured
to indicate in the samplesheet whether the reads are paired or single.
- Updated the Blastn settings to allow 7 days runtime at most, since that
covers 99.7% of the jobs.

### Software dependencies

Note, since the pipeline is using Nextflow DSL2, each process will be run with its own [Biocontainer](https://biocontainers.pro/#/registry). This means that on occasion it is entirely possible for the pipeline to be using different versions of the same tool. However, the overall software dependency changes compared to the last release have been listed below for reference. Only `Docker` or `Singularity` containers are supported, `conda` is not supported.

| Dependency | Old version | New version |
| ----------- | ----------- | ----------- |
| blobtoolkit | 4.3.9 | 4.3.13 |

## [[0.6.0](https://github.com/sanger-tol/blobtoolkit/releases/tag/0.6.0)] – Bellsprout – [2024-09-13]

Expand Down
13 changes: 7 additions & 6 deletions conf/base.config
Original file line number Diff line number Diff line change
Expand Up @@ -106,14 +106,15 @@ process {

withName: "BLAST_BLASTN" {

// There are blast failures we don't know how to fix. Just ignore for now
errorStrategy = { task.exitStatus in ((130..145) + 104) ? (task.attempt == process.maxRetries ? 'ignore' : 'retry') : 'finish' }
// There are blast failures we don't know how to fix. We just give up after 3 attempts
errorStrategy = { task.exitStatus in ((130..145) + 104) ? (task.attempt == 3 ? 'ignore' : 'retry') : 'finish' }


// Most jobs complete quickly but some need a lot longer. For those outliers,
// the CPU usage remains usually low, often nearing a single CPU
cpus = { check_max( 6 - (task.attempt-1), 'cpus' ) }
memory = { check_max( 1.GB * Math.pow(4, task.attempt-1), 'memory' ) }
time = { check_max( 10.h * Math.pow(4, task.attempt-1), 'time' ) }
// the CPU usage remains usually low, averaging a single CPU
cpus = { check_max( task.attempt == 1 ? 4 : 1, 'cpus' ) }
memory = { check_max( 2.GB, 'memory' ) }
time = { check_max( task.attempt == 1 ? 4.h : ( task.attempt == 2 ? 47.h : 167.h ), 'time' ) }
}

withName:CUSTOM_DUMPSOFTWAREVERSIONS {
Expand Down
25 changes: 19 additions & 6 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,20 @@ An [example samplesheet](assets/test/samplesheet.csv) has been provided with the
The pipeline can also accept a samplesheet generated by the [nf-core/fetchngs](https://nf-co.re/fetchngs) pipeline (tested with version 1.11.0).
The pipeline then needs the `--fetchngs_samplesheet true` option _and_ `--align true`, since the data files would all be unaligned.

## Getting databases ready for the pipeline
## Database parameters

Configure access to your local databases with the `--busco`, `--blastp`, `--blastx`, `--blastn`, and `--taxdump` parameters.

Note that `--busco` refers to the download path of _all_ lineages.
Then, when explicitly selecting the lineages to run the pipeline on,
provide the names of these lineages _with_ their `_odb10` suffix as a comma-separated string.
For instance:

```bash
--busco path-to-databases/busco/ --busco_lineages vertebrata_odb10,bacteria_odb10,fungi_odb10
```

### Getting databases ready for the pipeline

The BlobToolKit pipeline can be run in many different ways. The default way requires access to several databases:

Expand All @@ -65,7 +78,7 @@ The BlobToolKit pipeline can be run in many different ways. The default way requ

It is a good idea to put a date suffix for each database location so you know at a glance whether you are using the latest version. We are using the `YYYY_MM` format as we do not expect the databases to be updated more frequently than once a month. However, feel free to use `DATE=YYYY_MM_DD` or a different format if you prefer.

### 1. NCBI taxdump database
#### 1. NCBI taxdump database

Create the database directory and move into the directory:

Expand All @@ -82,7 +95,7 @@ Retrieve and decompress the NCBI taxdump:
curl -L ftp://ftp.ncbi.nih.gov/pub/taxonomy/new_taxdump/new_taxdump.tar.gz | tar xzf -
```

### 2. NCBI nucleotide BLAST database
#### 2. NCBI nucleotide BLAST database

Create the database directory and move into the directory:

Expand All @@ -106,7 +119,7 @@ tar xf taxdb.tar.gz -C $NT &&
rm taxdb.tar.gz
```

### 3. UniProt reference proteomes database
#### 3. UniProt reference proteomes database

You need [diamond blast](https://github.com/bbuchfink/diamond) installed for this step. The easiest way is probably using [conda](https://anaconda.org/bioconda/diamond). Make sure you have the latest version of Diamond (>2.x.x) otherwise the `--taxonnames` argument may not work.

Expand Down Expand Up @@ -140,7 +153,7 @@ zcat */*/*.idmapping.gz | grep "NCBI_TaxID" | awk '{print $1 "\t" $1 "\t" $3 "\t
diamond makedb -p 16 --in reference_proteomes.fasta.gz --taxonmap reference_proteomes.taxid_map --taxonnodes $TAXDUMP/nodes.dmp --taxonnames $TAXDUMP/names.dmp -d reference_proteomes.dmnd
```

### 4. BUSCO databases
#### 4. BUSCO databases

Create the database directory and move into the directory:

Expand Down Expand Up @@ -232,7 +245,7 @@ List of tools for any given dataset can be fetched from the API, for example htt

| Dependency | Snakemake | Nextflow |
| ----------------- | --------- | -------- |
| blobtoolkit | 4.3.2 | 4.3.9 |
| blobtoolkit | 4.3.2 | 4.3.13 |
| blast | 2.12.0 | 2.14.1 |
| blobtk | 0.5.0 | 0.5.1 |
| busco | 5.3.2 | 5.5.0 |
Expand Down
2 changes: 1 addition & 1 deletion modules/local/blobtoolkit/chunk.nf
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ process BLOBTOOLKIT_CHUNK {
if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) {
exit 1, "BLOBTOOLKIT_CHUNK module does not support Conda. Please use Docker / Singularity / Podman instead."
}
container "docker.io/genomehubs/blobtoolkit:4.3.9"
container "docker.io/genomehubs/blobtoolkit:4.3.13"

input:
tuple val(meta) , path(fasta)
Expand Down
2 changes: 1 addition & 1 deletion modules/local/blobtoolkit/countbuscos.nf
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ process BLOBTOOLKIT_COUNTBUSCOS {
if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) {
exit 1, "BLOBTOOLKIT_COUNTBUSCOS module does not support Conda. Please use Docker / Singularity / Podman instead."
}
container "docker.io/genomehubs/blobtoolkit:4.3.9"
container "docker.io/genomehubs/blobtoolkit:4.3.13"

input:
tuple val(meta), path(table, stageAs: 'dir??/*')
Expand Down
2 changes: 1 addition & 1 deletion modules/local/blobtoolkit/createblobdir.nf
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ process BLOBTOOLKIT_CREATEBLOBDIR {
if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) {
exit 1, "BLOBTOOLKIT_BLOBDIR module does not support Conda. Please use Docker / Singularity / Podman instead."
}
container "docker.io/genomehubs/blobtoolkit:4.3.9"
container "docker.io/genomehubs/blobtoolkit:4.3.13"

input:
tuple val(meta), path(window, stageAs: 'windowstats/*')
Expand Down
2 changes: 1 addition & 1 deletion modules/local/blobtoolkit/extractbuscos.nf
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ process BLOBTOOLKIT_EXTRACTBUSCOS {
if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) {
exit 1, "BLOBTOOLKIT_EXTRACTBUSCOS module does not support Conda. Please use Docker / Singularity / Podman instead."
}
container "docker.io/genomehubs/blobtoolkit:4.3.9"
container "docker.io/genomehubs/blobtoolkit:4.3.13"

input:
tuple val(meta), path(fasta)
Expand Down
2 changes: 1 addition & 1 deletion modules/local/blobtoolkit/summary.nf
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ process BLOBTOOLKIT_SUMMARY {
if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) {
exit 1, "BLOBTOOLKIT_SUMMARY module does not support Conda. Please use Docker / Singularity / Podman instead."
}
container "docker.io/genomehubs/blobtoolkit:4.3.9"
container "docker.io/genomehubs/blobtoolkit:4.3.13"

input:
tuple val(meta), path(blobdir)
Expand Down
2 changes: 1 addition & 1 deletion modules/local/blobtoolkit/unchunk.nf
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ process BLOBTOOLKIT_UNCHUNK {
if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) {
exit 1, "BLOBTOOLKIT_UNCHUNK module does not support Conda. Please use Docker / Singularity / Podman instead."
}
container "docker.io/genomehubs/blobtoolkit:4.3.9"
container "docker.io/genomehubs/blobtoolkit:4.3.13"

input:
tuple val(meta), path(blast_table)
Expand Down
2 changes: 1 addition & 1 deletion modules/local/blobtoolkit/updateblobdir.nf
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ process BLOBTOOLKIT_UPDATEBLOBDIR {
if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) {
exit 1, "BLOBTOOLKIT_BLOBDIR module does not support Conda. Please use Docker / Singularity / Podman instead."
}
container "docker.io/genomehubs/blobtoolkit:4.3.9"
container "docker.io/genomehubs/blobtoolkit:4.3.13"

input:
tuple val(meta), path(input, stageAs: "input_blobdir")
Expand Down
2 changes: 1 addition & 1 deletion modules/local/blobtoolkit/updatemeta.nf
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ process BLOBTOOLKIT_UPDATEMETA {
if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) {
exit 1, "BLOBTOOLKIT_UPDATEMETA module does not support Conda. Please use Docker / Singularity / Podman instead."
}
container "docker.io/genomehubs/blobtoolkit:4.3.9"
container "docker.io/genomehubs/blobtoolkit:4.3.13"

input:
tuple val(meta), path(input)
Expand Down
2 changes: 1 addition & 1 deletion modules/local/blobtoolkit/windowstats.nf
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ process BLOBTOOLKIT_WINDOWSTATS {
if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) {
exit 1, "BLOBTOOLKIT_WINDOWSTATS module does not support Conda. Please use Docker / Singularity / Podman instead."
}
container "docker.io/genomehubs/blobtoolkit:4.3.9"
container "docker.io/genomehubs/blobtoolkit:4.3.13"

input:
tuple val(meta), path(tsv)
Expand Down
2 changes: 1 addition & 1 deletion modules/local/generate_config.nf
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ process GENERATE_CONFIG {
label 'process_single'

conda "conda-forge::requests=2.28.1 conda-forge::pyyaml=6.0"
container "docker.io/genomehubs/blobtoolkit:4.3.9"
container "docker.io/genomehubs/blobtoolkit:4.3.13"

input:
tuple val(meta), val(fasta)
Expand Down

0 comments on commit 93bda84

Please sign in to comment.