diff --git a/README.md b/README.md index 82ea871..8d501e5 100644 --- a/README.md +++ b/README.md @@ -2,5 +2,5 @@ This repository aims to show users how to run `nf-core` pipeline with PEP as inp The command would look something like that: ``` -nextflow run nf-core/taxprofiler -profile test_pep --outdir /home/cgf8xr/nextflow-output +nextflow run main.nf -profile test_pep,docker --outdir ``` diff --git a/developers_tutorial.md b/developers_tutorial.md index 99092f7..6fb887c 100644 --- a/developers_tutorial.md +++ b/developers_tutorial.md @@ -1,5 +1,4 @@ # Tutorial for integrating `nf-core` with PEP - ## Introduction and summary This tutorial explains how to adapt `nf-core` @@ -7,7 +6,7 @@ This tutorial explains how to adapt `nf-core` An example implementation can be found in the `taxprofiler` [pipeline](https://nf-co.re/taxprofiler). A pull request with all the changes needed can be found here. -The steps to accomplish that are as follows: +The steps to accomplish PEP-`nf-core` integration for any `nf-core` pipeline are as follows: 1. Rewrite all pipeline input checks to [PEP schema](http://eido.databio.org/en/latest/writing-a-schema/). 2. If the script to check input does something more than input validation, then decouple the logic. @@ -21,56 +20,53 @@ Below is detailed explanation of these tasks as well as other information with additional resources that may be useful during implementation. -## 1. Rewrite all pipeline input checks - -In general, `nf-core` pipelines usually consist of a `check_samplesheet.py` -(or similarly named) Python script that is validates the -`samplesheet.csv` file. This validation checks if all mandatory columns are present in the file, -if all required columns have data, if extensions of the files are correct, etc. - -Here, we propose switching this approach to insetad use a PEP schema, so that the PEP validator (`eido`) can be used to accomplish -all checks formerly performed by `check_samplesheet.py`. Example PEP schema for `taxprofiler` -pipeline can be found here. - -## 2. Decouple in case of emergency +## Steps to complete the integration +### 1. Rewrite all pipeline input checks +In general `nf-core` pipelines usually consist of `check_samplesheet.py` +(or similarly named) Python script that is responsible for validation of +`samplesheet.csv` file (eg. if all mandatory columns are present in the file, +if all required columns have data, if extensions of the files are correct, etc.). +The goal of this task is to create a PEP schema from scratch, so that it exactly reflects +all the check from `check_samplesheet.py` Python script. +[Example PEP schema](https://github.com/nf-core/taxprofiler/pull/133/files#diff-abc09af6a9de56ba2e40d0fa32a4c0f8c2cd30a0299488c4d922453ad20f3100) +for `taxprofiler` pipeline is available in the pipeline code. + +### 2. Decouple in case of emergency In some cases previously mentioned `check_samplesheet.py` script not only was supposed to validate the input files, but was also adding additional column with information what type of reads given row has. Since `eido` is a tool just for validation, one can't add any column by using `eido/validate`. -The best option here is to identify (within `check_samplesheet.py`) the logic responsible for modification -of the input file and move it to separate Python script (`bin/place_the_script_here.py`). That way one can +The best option here is to identify (within `check_samplesheet.py`) the logic responsible for modification +of the input file and move it to separate Python script (`bin/place_the_script_here.py` in `taxprofiler` source code). That way one can still remove all the logic responsible for validation and replace it with `eido`, and modify the input `samplesheet.csv` using newly extracted Python script. -## 3. Add PEP as input parameter +### 3. Update --input parameter It will be good if all the pipelines will share a common interface, so that users can run PEP with all the -pipelines the same way. To accomplish that, the `--pep` parameter should be added to the pipeline. -Developer should allow pipeline to consume `--pep` parameter and make it mandatory to provide either `--input` -or `--pep` when running a pipeline (by default user must always pass `--input`). In case of `taxprofiler` pipeline -two files had to be edited: `lib/WorkflowMain.groovy` and `workflows/taxprofiler.nf`. +pipelines the same way. Developer should adjust `--input` parameter to be able to accept also PEP config. -## 4. Adjust `nextflow_schema.json` -This step is strongly coupled with `3. Add PEP as input parameter`. When adding new parameter to the pipeline, -one must adjust the `nextflow_schema.json` to avoild validation errors. The only thing needed here is to tell -that instead of one mandatory argument (`--input`), we will now have one of `[--input, --pep]` as mandatory. +The developer must also update `nextflow_schema.json`. When adding new parameter to the pipeline, +he must adjust the `nextflow_schema.json` to avoid validation errors. The only thing needed here is to +allow passing `yaml` files in the schema. -## 5. Install `eido` modules +### 4. Install `eido` modules Eido is currently added as a module to `nf-core` modules. That way it can be shared across all the pipelines. To be able to use `EIDO_VALIDATE` and `EIDO_CONVERT` commands in the pipeline, the developer first must install the -modules for current pipeline. Tutorial how to do it can be found -[here](https://nf-co.re/tools/#install-modules-in-a-pipeline). +modules for current pipeline. There is available tutorial [how to install modules in a pipeline](https://nf-co.re/tools/#install-modules-in-a-pipeline). -## 6. Adjust the workflow responsible for input check +### 5. Adjust the workflow responsible for input check When incorporating new modules, the workflow will change. In my case changes were needed in `modules/local/samplesheet_check.nf` and `subworkflows/local/input_check.nf`. -## 7. Create test config +### 6. Create test config Developer should create test config so that user can run pipeline with PEP as input with minimal effort. -In order to do it, new config profile should be added as shown in `taxprofiler` pull request. +In order to do it, new config profile should be added as shown in `taxprofiler` [pull request containing +all changes](https://github.com/nf-core/taxprofiler/pull/133/files#diff-13b96be1e48daf716d5ac39dae9f905df6a0e0d4af0232e3f5c36fd52a178862). +Config will contain the minimal setup allowing to run analysis using PEP files. -## 8. Other information -### Biocontainers +## Other information +### How to add the tool to biocontainers In general all necessary modules (`eido/validate` and `eido/convert`) are already added to `nf-core modules`, but it may happen that the developer will need to add other tools. In order to do it, it's good to know how this works for `nf-core`. To be able to use any container in `nf-core` pipelines they should be hosted on @@ -79,5 +75,5 @@ There are two ways to accomplish that: 1. Put `peppy` to `bioconda`. This is the easiest way, and when `peppy` is available in `bioconda`, then `biocontainers` provide an automated container creation for this tool. -2. Manually add `peppy` to biocontainers. Detailed tutorial how to do it is available - [here](https://biocontainers-edu.readthedocs.io/en/latest/contributing.html). +2. Manually add `peppy` to biocontainers. There is detailed + [tutorial how to add the tool to biocontainers](https://biocontainers-edu.readthedocs.io/en/latest/contributing.html) available.